Do not try to use AggregatePlainTextProcessor, it is just slow.
Use the fast version and debug the password issues.
Make sure you have your UMLS credentials set in:
$CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
16ab.xml

in two different places.

Tim



On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> Hi James,
>  
> Thanks for responding.
>  
> Single file is taking ~5 hours to process with
> AggregatePlainTextProcessor of size 2 Mb. This is how the process
> looks like for JVM arguments regarding memory:
>  
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bi
> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame
>  
> Also, just now I tried to process the file with AE
>  AggregatePlaintextFastUMLSProcessor but ran into different problem
> of not getting authentication error with same username password being
> used in AggregatePlainTextProcessor.
>  
> I can run it with AggregatePlaintextFastUMLSProcessor by increasing
> Xms 5g and Xmx5g,  if you could please let me know how can it be
> possible that with one AE AggregatePlainTextProcessor it is running
> fine with above username and password but giving below exception with
> same username, password with AggregatePlaintextFastUMLSProcessor.
>  
> Exception:
>  
>  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
> java.util.prefs.WindowsPreferences <init> WARNING: Could not
> open/create prefs root node Software\JavaSoft\Prefs at root
> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j:
> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017
> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 
> INFO ContextDependentTokenizerAnnotator - Finite state machines
> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP
> VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications: 13 Dec 2017 21:05:00  INFO
> UmlsUserApprover - Checking UMLS Account at https://uts-ws.nlm.nih.go
> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://uts-ws.nl
> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> with XXXXXXX  
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:81)         at
> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingE
> ngine(UIMAFramework_impl.java:420)         at
> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAF
> ramework.java:918)         at
> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
>         at
> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>         at
> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> by: org.apache.uima.resource.ResourceConfigurationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1101)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProces
> sors(CPEFactory.java:547)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java
> :253)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.ja
> va:127)         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
> Caused by: org.apache.uima.resource.ResourceInitializationException:
> Initialization of annotator class
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:407)         at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java
> :256)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tASB(AggregateAnalysisEngine_impl.java:429)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
>         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tialize(AggregateAnalysisEngine_impl.java:186)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:448)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused by:
> org.apache.uima.resource.ResourceInitializationException: MESSAGE
> LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBundle, key C ould not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:131)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>         ... 24 more Caused by:
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBu ndle, key Could not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:199)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionaries(DictionaryDescriptorParser.java:156)        
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDescriptor(DictionaryDescriptorParser.java:128)         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
> Caused by: java.lang.reflect.InvocationTargetException         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)         at java.lang.reflect.Constructor.newInstance(Unknown
> Source)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:196)        
> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
> dictionary sno_rx_16abTerms         at
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>  
>  
>  
> From: James Masanz [mailto:[email protected]
> Sent: Wednesday, December 13, 2017 8:56 PM
> To: [email protected]
> Subject: Re: Slowness in processing files
>  
> Using AggregatePlaintextFastUMLSProcessor  is much faster than
> AggregatePlainTextProcessor, so I suggest that to start with you just
> use AggregatePlaintextFastUMLSProcessor.
>  
> Do you mean it is taking ~5 hours for a single file to be processed
> at times, or is that for a set of files?
>  
> If your JVM heap space is not set large enough, you can get very slow
> results.
> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
> For faster start up, you can also set the -Xms to the same or
> something close to -Xmx value.
>  
>  -- James
>  
> On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]>
> wrote:
> Hi All,
>  
> When the medical records are run with the AE as
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> the processing is very slow. It is pretty fast when the smaller files
> (~2 kb) are fed as input but when I am processing with bigger files
> say, 2Mb, it is very slow and the files are taking ~5 hours to
> process. Any pointer will be of great help.
>  
> Regards,
> Harish.
>  

Reply via email to