Do not try to use AggregatePlainTextProcessor, it is just slow. Use the fast version and debug the password issues. Make sure you have your UMLS credentials set in: $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_ 16ab.xml
in two different places. Tim On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote: > Hi James, > > Thanks for responding. > > Single file is taking ~5 hours to process with > AggregatePlainTextProcessor of size 2 Mb. This is how the process > looks like for JVM arguments regarding memory: > > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp > "C:\New_Drive\apache-ctakes-4.0.0-bi > apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0- > bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0- > bin\apache-ctakes-4.0.0\lib\*" -Dlog4j. > nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\config\log4j.xml -Xms512M -Xmx3g > org.apache.uima.tools.cpm.CpmFrame > > Also, just now I tried to process the file with AE > AggregatePlaintextFastUMLSProcessor but ran into different problem > of not getting authentication error with same username password being > used in AggregatePlainTextProcessor. > > I can run it with AggregatePlaintextFastUMLSProcessor by increasing > Xms 5g and Xmx5g, if you could please let me know how can it be > possible that with one AE AggregatePlainTextProcessor it is running > fine with above username and password but giving below exception with > same username, password with AggregatePlaintextFastUMLSProcessor. > > Exception: > > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp > "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes- > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-ctakes- > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g > org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM > java.util.prefs.WindowsPreferences <init> WARNING: Could not > open/create prefs root node Software\JavaSoft\Prefs at root > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j: > attributes.... 13 Dec 2017 21:04:58 INFO Chunker - Chunker model > file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017 > 21:05:00 INFO TokenizerAnnotatorPTB - Initializing > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 > INFO ContextDependentTokenizerAnnotator - Finite state machines > loaded. 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using > dictionary lookup window type: > org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017 > 21:05:00 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: > CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP > VBZ WDT WP WPS WRB 13 Dec 2017 21:05:00 INFO > AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec > 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using Dictionary > Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml > 13 Dec 2017 21:05:00 INFO DictionaryDescriptorParser - Parsing > dictionary specifications: 13 Dec 2017 21:05:00 INFO > UmlsUserApprover - Checking UMLS Account at https://uts-ws.nlm.nih.go > v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017 > 21:05:02 ERROR UmlsUserApprover - UMLS Account at https://uts-ws.nl > m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß > with XXXXXXX > org.apache.uima.resource.ResourceInitializationException: > Initialization of CAS Processor with name > "AggregatePlaintextFastUMLSProcessor" failed. at > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi > alize(CollectionProcessingEngine_impl.java:81) at > org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingE > ngine(UIMAFramework_impl.java:420) at > org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAF > ramework.java:918) at > org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573) > at > org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105) > at > org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused > by: org.apache.uima.resource.ResourceConfigurationException: > Initialization of CAS Processor with name > "AggregatePlaintextFastUMLSProcessor" failed. at > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg > ratedCasProcessor(CPEFactory.java:1101) at > org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProces > sors(CPEFactory.java:547) at > org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java > :253) at > org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.ja > va:127) at > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi > alize(CollectionProcessingEngine_impl.java:73) ... 5 more > Caused by: org.apache.uima.resource.ResourceInitializationException: > Initialization of annotator class > "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator " > failed. (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0- > bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup- > fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini > tialize(PrimitiveAnalysisEngine_impl.java:170) at > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy > sisEngineFactory_impl.java:94) at > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co > mpositeResourceFactory_impl.java:62) at > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279) > at > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav > a:407) at > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java > :256) at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > tASB(AggregateAnalysisEngine_impl.java:429) at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373) > at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini > tialize(AggregateAnalysisEngine_impl.java:186) at > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy > sisEngineFactory_impl.java:94) at > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co > mpositeResourceFactory_impl.java:62) at > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279) > at > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331) > at > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav > a:448) at > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg > ratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused by: > org.apache.uima.resource.ResourceInitializationException: MESSAGE > LOCALIZATION FAILED: Can't find resource for bundle > java.util.PropertyResourceBundle, key C ould not construct > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti > onary at > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini > tialize(AbstractJCasTermAnnotator.java:131) at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266) > ... 24 more Caused by: > org.apache.uima.analysis_engine.annotator.AnnotatorContextException: > MESSAGE LOCALIZATION FAILED: Can't find resource for bundle > java.util.PropertyResourceBu ndle, key Could not construct > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti > onary at > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP > arser.parseDictionary(DictionaryDescriptorParser.java:199) at > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP > arser.parseDictionaries(DictionaryDescriptorParser.java:156) > at > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP > arser.parseDescriptor(DictionaryDescriptorParser.java:128) at > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini > tialize(AbstractJCasTermAnnotator.java:129) ... 25 more > Caused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) at java.lang.reflect.Constructor.newInstance(Unknown > Source) at > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP > arser.parseDictionary(DictionaryDescriptorParser.java:196) > ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS > dictionary sno_rx_16abTerms at > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti > onary.<init>(UmlsJdbcRareWordDictionary.java:29) ... 33 more > > > > From: James Masanz [mailto:[email protected]] > Sent: Wednesday, December 13, 2017 8:56 PM > To: [email protected] > Subject: Re: Slowness in processing files > > Using AggregatePlaintextFastUMLSProcessor is much faster than > AggregatePlainTextProcessor, so I suggest that to start with you just > use AggregatePlaintextFastUMLSProcessor. > > Do you mean it is taking ~5 hours for a single file to be processed > at times, or is that for a set of files? > > If your JVM heap space is not set large enough, you can get very slow > results. > Try increasing to 5G (or more) using the JVM parameter -Xmx5G > For faster start up, you can also set the -Xms to the same or > something close to -Xmx value. > > -- James > > On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]> > wrote: > Hi All, > > When the medical records are run with the AE as > AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor > the processing is very slow. It is pretty fast when the smaller files > (~2 kb) are fed as input but when I am processing with bigger files > say, 2Mb, it is very slow and the files are taking ~5 hours to > process. Any pointer will be of great help. > > Regards, > Harish. >
