Hi James, Thanks for responding.
Single file is taking ~5 hours to process with AggregatePlainTextProcessor of size 2 Mb. This is how the process looks like for JVM arguments regarding memory: C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp "C:\New_Drive\apache-ctakes-4.0.0-bi apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*" -Dlog4j. nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame Also, just now I tried to process the file with AE AggregatePlaintextFastUMLSProcessor but ran into different problem of not getting authentication error with same username password being used in AggregatePlainTextProcessor. I can run it with AggregatePlaintextFastUMLSProcessor by increasing Xms 5g and Xmx5g, if you could please let me know how can it be possible that with one AE AggregatePlainTextProcessor it is running fine with above username and password but giving below exception with same username, password with AggregatePlaintextFastUMLSProcessor. Exception: C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM java.util.prefs.WindowsPreferences <init> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j: attributes.... 13 Dec 2017 21:04:58 INFO Chunker - Chunker model file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017 21:05:00 INFO TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 INFO ContextDependentTokenizerAnnotator - Finite state machines loaded. 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using dictionary lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT WP WPS WRB 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using Dictionary Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml 13 Dec 2017 21:05:00 INFO DictionaryDescriptorParser - Parsing dictionary specifications: 13 Dec 2017 21:05:00 INFO UmlsUserApprover - Checking UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017 21:05:02 ERROR UmlsUserApprover - UMLS Account at https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß with XXXXXXX org.apache.uima.resource.ResourceInitializationException: Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" failed. at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:81) at org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingEngine(UIMAFramework_impl.java:420) at org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAFramework.java:918) at org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573) at org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105) at org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused by: org.apache.uima.resource.ResourceConfigurationException: Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" failed. at org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1101) at org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProcessors(CPEFactory.java:547) at org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java:253) at org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.java:127) at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initialize(CollectionProcessingEngine_impl.java:73) ... 5 more Caused by: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator " failed. (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:170) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:448) at org.apache.uima.collection.impl.cpm.container.CPEFactory.produceIntegratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused by: org.apache.uima.resource.ResourceInitializationException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBundle, key C ould not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:131) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266) ... 24 more Caused by: org.apache.uima.analysis_engine.annotator.AnnotatorContextException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBu ndle, key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:199) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionaries(DictionaryDescriptorParser.java:156) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDescriptor(DictionaryDescriptorParser.java:128) at org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.initialize(AbstractJCasTermAnnotator.java:129) ... 25 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorParser.parseDictionary(DictionaryDescriptorParser.java:196) ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS dictionary sno_rx_16abTerms at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary.<init>(UmlsJdbcRareWordDictionary.java:29) ... 33 more From: James Masanz [mailto:[email protected]] Sent: Wednesday, December 13, 2017 8:56 PM To: [email protected] Subject: Re: Slowness in processing files Using AggregatePlaintextFastUMLSProcessor is much faster than AggregatePlainTextProcessor, so I suggest that to start with you just use AggregatePlaintextFastUMLSProcessor. Do you mean it is taking ~5 hours for a single file to be processed at times, or is that for a set of files? If your JVM heap space is not set large enough, you can get very slow results. Try increasing to 5G (or more) using the JVM parameter -Xmx5G For faster start up, you can also set the -Xms to the same or something close to -Xmx value. -- James On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]<mailto:[email protected]>> wrote: Hi All, When the medical records are run with the AE as AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the processing is very slow. It is pretty fast when the smaller files (~2 kb) are fed as input but when I am processing with bigger files say, 2Mb, it is very slow and the files are taking ~5 hours to process. Any pointer will be of great help. Regards, Harish.
