Hi James,

Thanks for responding.

Single file is taking ~5 hours to process with AggregatePlainTextProcessor of 
size 2 Mb. This is how the process looks like for JVM arguments regarding 
memory:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java 
-Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp 
"C:\New_Drive\apache-ctakes-4.0.0-bi
apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*"
 -Dlog4j.
nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml
 -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame

Also, just now I tried to process the file with AE  
AggregatePlaintextFastUMLSProcessor but ran into different problem of not 
getting authentication error with same username password being used in 
AggregatePlainTextProcessor.

I can run it with AggregatePlaintextFastUMLSProcessor by increasing Xms 5g and 
Xmx5g,  if you could please let me know how can it be possible that with one AE 
AggregatePlainTextProcessor it is running fine with above username and password 
but giving below exception with same username, password with 
AggregatePlaintextFastUMLSProcessor.

Exception:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java 
-Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp 
"C:\New_Drive\apache-ctakes-4.0.0-bin\
apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*"
 -Dlog4j.co
nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml
 -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame
Dec 13, 2017 9:01:20 PM java.util.prefs.WindowsPreferences <init>
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 
0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
log4j: attributes....
13 Dec 2017 21:04:58  INFO Chunker - Chunker model file: 
org/apache/ctakes/chunker/models/chunker-model.zip
13 Dec 2017 21:05:00  INFO TokenizerAnnotatorPTB - Initializing 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
13 Dec 2017 21:05:00  INFO ContextDependentTokenizerAnnotator - Finite state 
machines loaded.
13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using dictionary lookup 
window type: org.apache.ctakes.typesystem.type.textspan.Sentence
13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: 
CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT 
WP WPS WRB

13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using minimum term text 
span: 3
13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary 
Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing dictionary 
specifications:
13 Dec 2017 21:05:00  INFO UmlsUserApprover - Checking UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234-ß:
....13 Dec 2017 21:05:02 ERROR UmlsUserApprover -   UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid for user 
XXXXXXX-ß with XXXXXXX

org.apache.uima.resource.ResourceInitializationException: Initialization of CAS 
Processor with name "AggregatePlaintextFastUMLSProcessor" failed.
        at 
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:81)
        at 
org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingEngine(UIMAFramework_impl.java:420)
        at 
org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAFramework.java:918)
        at org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
        at org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
        at org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713)
Caused by: org.apache.uima.resource.ResourceConfigurationException: 
Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" 
failed.
        at 
org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1101)
        at 
org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProcessors(CPEFactory.java:547)
        at 
org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java:253)
        at 
org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.java:127)
        at 
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:73)
        ... 5 more
Caused by: org.apache.uima.resource.ResourceInitializationException: 
Initialization of annotator class 
"org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
" failed.  (Descriptor: 
file:/C:/New_Drive/apache-ctakes-4.0.0-bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml)
        at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
        at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170)
        at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
        at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
        at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
        at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
        at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
        at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
        at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
        at 
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
        at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
        at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
        at 
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:448)
        at 
org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1085)
        ... 9 more
Caused by: org.apache.uima.resource.ResourceInitializationException: MESSAGE 
LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBundle, key C
ould not construct 
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary
        at 
org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131)
        at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
        ... 24 more
Caused by: org.apache.uima.analysis_engine.annotator.AnnotatorContextException: 
MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
java.util.PropertyResourceBu
ndle, key Could not construct 
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary
        at 
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:199)
        at 
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156)
        at 
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:128)
        at 
org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:129)
        ... 25 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at 
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:196)
        ... 28 more
Caused by: java.sql.SQLException: Invalid User for UMLS dictionary 
sno_rx_16abTerms
        at 
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.<init>(UmlsJdbcRareWordDictionary.java:29)
        ... 33 more




From: James Masanz [mailto:[email protected]]
Sent: Wednesday, December 13, 2017 8:56 PM
To: [email protected]
Subject: Re: Slowness in processing files

Using AggregatePlaintextFastUMLSProcessor  is much faster than 
AggregatePlainTextProcessor, so I suggest that to start with you just use 
AggregatePlaintextFastUMLSProcessor.

Do you mean it is taking ~5 hours for a single file to be processed at times, 
or is that for a set of files?

If your JVM heap space is not set large enough, you can get very slow results.
Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
For faster start up, you can also set the -Xms to the same or something close 
to -Xmx value.

 -- James

On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

When the medical records are run with the AE as 
AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the 
processing is very slow. It is pretty fast when the smaller files (~2 kb) are 
fed as input but when I am processing with bigger files say, 2Mb, it is very 
slow and the files are taking ~5 hours to process. Any pointer will be of great 
help.

Regards,
Harish.

Reply via email to