sorry, I meant verify that the contents of the xml file for the fast dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)
On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <[email protected]> wrote: > > Harish, > > with the AggregatePlaintextFastUMLSProcessor, it should not be taking > that long. It sounds like either something outside of cTAKES is having an > issue (a hard drive starting to fail) or that you are accidentally running > AggregatePlaintextUMLSProcessor. > > I've had issues with the CPE GUI not always behaving well for me. > > I suggest when you run the CPE GUI, you use File->Clear all and > re-enter/re-select what you want. > If that doesn't help, verify that the contents of > AggregatePlaintextUMLSProcessor haven't been changed. > > If none of that helps, as a last resort, I'd look into hard drive error > logs. > > Also, are you using a Cas Consumer? if so, which one. > > > On Thu, Dec 14, 2017 at 12:04 PM, <[email protected]> wrote: > >> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take >> ~11*1000 seconds which is about 3 hours (under the assumption that the >> runtime is linear to the file size). >> >> I do not know if the pipeline can be sped up. I would suggest to chunk >> the file into smaller chunks (pieces) and run the pipeline in parallel for >> each chunk. >> >> Jonas S >> >> Am 14.12.17 um 17:48 schrieb Yadav, Harish: >> >>> Hi Timothy, >>> >>> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor >>> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) >>> for a single file of 2 Mb size. It runs fine for 2 Kb file. >>> >>> Regards, >>> Harish. >>> >>> -----Original Message----- >>> From: Miller, Timothy [mailto:[email protected]] >>> Sent: Thursday, December 14, 2017 11:22 AM >>> To: [email protected] >>> Subject: Re: Slowness in processing files [EXTERNAL] >>> >>> You missed the most important part of my message: >>> >>>> Do not try to use AggregatePlainTextProcessor, it is just slow. >>>> >>> >>> Use AggregatePlaintextFastUMLSProcessor >>> >>> Tim >>> >>> >>> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote: >>> >>>> Hi Timothy, >>>> >>>> I fixed the password issues and ran with AE >>>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a >>>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I >>>> have checked the memory consumption of the process and it never goes >>>> above 4.5 G, so I am not sure if it is the memory issue. However, AE >>>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but >>>> most of our files are in Mbs so processing time for each file for more >>>> than 2 hours is not feasible. >>>> >>>> Could you please suggest something which may improve the performance. >>>> Below are the logs for the process of 2 Mb file with >>>> AggregatePlainTextProcessor: >>>> >>>> >>>> >>>> Logs: >>>> >>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp >>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- >>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- >>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- >>>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes- >>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g >>>> org.apache.uima.tools.cpm.CpmFrame >>>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init> >>>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs >>>> at root 0x80000002. Windows >>>> RegCreateKeyEx(...) returned error code 5. >>>> log4j: reset attribute= "false". >>>> log4j: Threshold ="null". >>>> log4j: Retreiving an instance of org.apache.log4j.Logger. >>>> log4j: Setting [ProgressAppender] additivity to [false]. >>>> log4j: Level value for ProgressAppender is [INFO]. >>>> log4j: ProgressAppender level set to INFO >>>> log4j: Class name: [org.apache.log4j.ConsoleAppender] >>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" >>>> log4j: Setting property [conversionPattern] to [%m]. >>>> log4j: Adding appender named [noEolAppender] to category >>>> [ProgressAppender]. >>>> log4j: Retreiving an instance of org.apache.log4j.Logger. >>>> log4j: Setting [ProgressDone] additivity to [false]. >>>> log4j: Level value for ProgressDone is [INFO]. >>>> log4j: ProgressDone level set to INFO >>>> log4j: Class name: [org.apache.log4j.ConsoleAppender] >>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" >>>> log4j: Setting property [conversionPattern] to [%m%n]. >>>> log4j: Adding appender named [eolAppender] to category [ProgressDone]. >>>> log4j: Level value for root is [INFO]. >>>> log4j: root level set to INFO >>>> log4j: Class name: [org.apache.log4j.ConsoleAppender] >>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" >>>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy >>>> HH:mm:ss} %5p %c{1} - %m%n]. >>>> log4j: Adding appender named [consoleAppender] to category [root]. >>>> 14 Dec 2017 09:42:09 INFO Chunker - Chunker model file: >>>> org/apache/ctakes/chunker/models/chunker-model.zip >>>> 14 Dec 2017 09:42:10 INFO TokenizerAnnotatorPTB - Initializing >>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB >>>> 14 Dec 2017 09:42:10 INFO ContextDependentTokenizerAnnotator - Finite >>>> state machines loaded. >>>> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using >>>> dictionary lookup window type: >>>> org.apache.ctakes.typesystem.type.textspan.Sentence >>>> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Exclusion >>>> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB >>>> VBD VBG VBN VBP VBZ WDT WP WPS WRB >>>> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using minimum >>>> term text span: 3 >>>> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using >>>> Dictionary Descriptor: >>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml >>>> 14 Dec 2017 09:42:10 INFO DictionaryDescriptorParser - Parsing >>>> dictionary specifications: >>>> 14 Dec 2017 09:42:10 INFO UmlsUserApprover - Checking UMLS Account at >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm. >>>> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW >>>> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- >>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC >>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for >>>> user harish1234: >>>> .14 Dec 2017 09:42:11 INFO UmlsUserApprover - UMLS Account at http >>>> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts- >>>> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z >>>> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- >>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC >>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for >>>> user harish1234 has been validated >>>> >>>> 14 Dec 2017 09:42:11 INFO JdbcConnectionFactory - Connecting to >>>> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s >>>> no_rx_16ab/sno_rx_16ab: >>>> 14 Dec 2017 09:42:11 INFO ENGINE - open start - state not modified >>>> .................. >>>> 14 Dec 2017 09:42:17 INFO JdbcConnectionFactory - Database connected >>>> 14 Dec 2017 09:42:17 INFO JdbcRareWordDictionary - Connected to cui >>>> and term table CUI_TERMS >>>> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept >>>> table TUI with class TUI >>>> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept >>>> table RXNORM with class LONG >>>> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept >>>> table PREFTERM with class PREFTERM >>>> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept >>>> table SNOMEDCT_US with class LONG >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right scope >>>> sizes: 10 , 10 >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order: >>>> LEFT,RIGHT >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3] >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer: >>>> org.apache.ctakes.necontexts.status.StatusContextAnalyzer >>>> 14 Dec 2017 09:42:17 INFO StatusContextAnalyzer - initBoundaryData() >>>> called for ContextInitializer >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer: >>>> org.apache.ctakes.necontexts.status.StatusContextHitConsumer >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window >>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type: >>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type: >>>> org.apache.ctakes.typesystem.type.syntax.BaseToken >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right scope >>>> sizes: 7 , 7 >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order: >>>> LEFT,RIGHT >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3] >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer: >>>> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer >>>> 14 Dec 2017 09:42:17 INFO NegationContextAnalyzer - >>>> initBoundaryData() called for ContextInitializer >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer: >>>> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window >>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type: >>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation >>>> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type: >>>> org.apache.ctakes.typesystem.type.syntax.BaseToken >>>> 14 Dec 2017 09:42:17 INFO SentenceDetector - Sentence detector model >>>> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip >>>> 14 Dec 2017 09:42:17 INFO POSTagger - POS tagger model file: >>>> org/apache/ctakes/postagger/models/mayo-pos.zip >>>> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - Loading NLM Norm >>>> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0- >>>> bin\apache-ctakes- >>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties >>>> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - config file >>>> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- >>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties >>>> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cwd = >>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0 >>>> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd >>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- >>>> 4.0.0\resources\org\apache\ctakes\lvg\ >>>> 14 Dec 2017 09:42:18 INFO ENGINE - open start - state not modified >>>> 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open start >>>> 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open end >>>> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd >>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0 >>>> 14 Dec 2017 09:42:18 INFO DrugMentionAnnotator - Finite state >>>> machines loaded. >>>> 14 Dec 2017 09:42:23 INFO ClearNLPDependencyParserAE - using Morphy >>>> analysis? true Loading configuration. >>>> Loading feature templates. >>>> Loading lexica. >>>> Loading model: >>>> ..................................................................... >>>> ................... >>>> Loading configuration. >>>> Loading feature templates. >>>> Loading model: >>>> . >>>> Loading configuration. >>>> Loading feature templates. >>>> Loading lexica. >>>> Loading model: >>>> ... >>>> <various Loading model> >>>> . >>>> Loading configuration. >>>> Loading feature templates. >>>> Loading lexica. >>>> Loading model: >>>> ................................ >>>> Loading model: >>>> ............................. >>>> 14 Dec 2017 09:42:32 INFO ConstituencyParser - Initializing parser... >>>> 14 Dec 2017 09:42:33 INFO SentenceDetector - Starting processing. >>>> 14 Dec 2017 09:42:34 INFO TokenizerAnnotatorPTB - process(JCas) in >>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB >>>> 14 Dec 2017 09:42:36 INFO LvgAnnotator - process(JCas) >>>> 14 Dec 2017 09:42:55 INFO ContextDependentTokenizerAnnotator - >>>> process(JCas) >>>> 14 Dec 2017 09:42:58 INFO POSTagger - process(JCas) >>>> 14 Dec 2017 09:43:10 INFO Chunker - process(JCas) >>>> 14 Dec 2017 09:43:46 INFO ChunkAdjuster - process(JCas) >>>> 14 Dec 2017 09:43:47 INFO ChunkAdjuster - process(JCas) >>>> 14 Dec 2017 09:43:48 INFO AbstractJCasTermAnnotator - Starting >>>> processing >>>> 14 Dec 2017 09:43:54 INFO AbstractJCasTermAnnotator - Finished >>>> processing >>>> 14 Dec 2017 09:43:54 INFO DrugMentionAnnotator - process(JCas) >>>> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:34 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:38 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:39 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:42 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:43 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:45 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:53 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:54 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:45:59 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:00 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:05 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:06 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:08 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:11 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:16 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:24 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:27 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:30 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:32 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:35 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:38 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:45 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:53 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:46:54 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:02 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:22 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:24 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:28 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:29 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:34 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:38 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:46 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:49 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:47:58 INFO DrugMentionAnnotator - >>>> 14 Dec 2017 09:48:45 INFO MaxentParserWrapper - Started processing: >>>> idd_secondTrial.txt >>>> 14 Dec 2017 10:20:19 INFO MaxentParserWrapper - Done parsing: >>>> idd_secondTrial.txt >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Regards, >>>> Harish. >>>> >>>> >>>> -----Original Message----- >>>> From: Miller, Timothy [mailto:[email protected]] >>>> Sent: Thursday, December 14, 2017 9:16 AM >>>> To: [email protected] >>>> Subject: Re: Slowness in processing files [EXTERNAL] >>>> >>>> Do not try to use AggregatePlainTextProcessor, it is just slow. >>>> Use the fast version and debug the password issues. >>>> Make sure you have your UMLS credentials set in: >>>> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r >>>> x_ >>>> 16ab.xml >>>> >>>> in two different places. >>>> >>>> Tim >>>> >>>> >>>> >>>> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote: >>>> >>>>> >>>>> Hi James, >>>>> Thanks for responding. >>>>> Single file is taking ~5 hours to process with >>>>> AggregatePlainTextProcessor of size 2 Mb. This is how the process >>>>> looks like for JVM arguments regarding memory: >>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java >>>>> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp >>>>> "C:\New_Drive\apache-ctakes-4.0.0-bi >>>>> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0- >>>>> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes- >>>>> 4.0.0- >>>>> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j. >>>>> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache- >>>>> ctakes- >>>>> 4.0.0\config\log4j.xml -Xms512M -Xmx3g >>>>> org.apache.uima.tools.cpm.CpmFrame >>>>> Also, just now I tried to process the file with AE >>>>> AggregatePlaintextFastUMLSProcessor but ran into different problem >>>>> of not getting authentication error with same username password >>>>> being used in AggregatePlainTextProcessor. >>>>> I can run it with AggregatePlaintextFastUMLSProcessor by increasing >>>>> Xms 5g and Xmx5g, if you could please let me know how can it be >>>>> possible that with one AE AggregatePlainTextProcessor it is running >>>>> fine with above username and password but giving below exception >>>>> with same username, password with >>>>> AggregatePlaintextFastUMLSProcessor. >>>>> Exception: >>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java >>>>> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp >>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes- >>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- >>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache- >>>>> ctakes- >>>>> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache- >>>>> ctakes- >>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g >>>>> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM >>>>> java.util.prefs.WindowsPreferences <init> WARNING: Could not >>>>> open/create prefs root node Software\JavaSoft\Prefs at root >>>>> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. >>>>> log4j: >>>>> attributes.... 13 Dec 2017 21:04:58 INFO Chunker - Chunker model >>>>> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec >>>>> 2017 >>>>> 21:05:00 INFO TokenizerAnnotatorPTB - Initializing >>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 >>>>> 21:05:00 >>>>> INFO ContextDependentTokenizerAnnotator - Finite state machines >>>>> loaded. 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using >>>>> dictionary lookup window type: >>>>> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017 >>>>> 21:05:00 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: >>>>> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN >>>>> VBP VBZ WDT WP WPS WRB 13 Dec 2017 21:05:00 INFO >>>>> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec >>>>> 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using Dictionary >>>>> Descriptor: >>>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml >>>>> 13 Dec 2017 21:05:00 INFO DictionaryDescriptorParser - Parsing >>>>> dictionary specifications: 13 Dec 2017 21:05:00 INFO >>>>> UmlsUserApprover >>>>> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url >>>>> ?u=https-3A__uts- >>>>> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx >>>>> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- >>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ >>>>> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e= >>>>> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017 >>>>> 21:05:02 ERROR UmlsUserApprover - UMLS Account at https://urldefe >>>>> nse.proofpoint.com/v2/url?u=https-3A__uts- >>>>> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He >>>>> up-IbsIg9Q1TPOylpP9FE4GTK- >>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ >>>>> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e= >>>>> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß >>>>> with XXXXXXX >>>>> org.apache.uima.resource.ResourceInitializationException: >>>>> Initialization of CAS Processor with name >>>>> "AggregatePlaintextFastUMLSProcessor" failed. at >>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini >>>>> ti >>>>> alize(CollectionProcessingEngine_impl.java:81) at >>>>> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin >>>>> gE >>>>> ngine(UIMAFramework_impl.java:420) at >>>>> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM >>>>> AF >>>>> ramework.java:918) at >>>>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57 >>>>> 3) >>>>> at >>>>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105) >>>>> at >>>>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused >>>>> by: org.apache.uima.resource.ResourceConfigurationException: >>>>> Initialization of CAS Processor with name >>>>> "AggregatePlaintextFastUMLSProcessor" failed. at >>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt >>>>> eg >>>>> ratedCasProcessor(CPEFactory.java:1101) at >>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc >>>>> es >>>>> sors(CPEFactory.java:547) at >>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja >>>>> va >>>>> :253) at >>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl. >>>>> ja >>>>> va:127) at >>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini >>>>> ti >>>>> alize(CollectionProcessingEngine_impl.java:73) ... 5 more >>>>> Caused by: >>>>> org.apache.uima.resource.ResourceInitializationException: >>>>> Initialization of annotator class >>>>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator" >>>>> failed. (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0- >>>>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup- >>>>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at >>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i >>>>> ni >>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) >>>>> at >>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i >>>>> ni >>>>> tialize(PrimitiveAnalysisEngine_impl.java:170) at >>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana >>>>> ly >>>>> sisEngineFactory_impl.java:94) at >>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource( >>>>> Co >>>>> mpositeResourceFactory_impl.java:62) at >>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27 >>>>> 9) >>>>> at >>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j >>>>> av >>>>> a:407) at >>>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja >>>>> va >>>>> :256) at >>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i >>>>> ni >>>>> tASB(AggregateAnalysisEngine_impl.java:429) at >>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i >>>>> ni >>>>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37 >>>>> 3) >>>>> at >>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i >>>>> ni >>>>> tialize(AggregateAnalysisEngine_impl.java:186) at >>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana >>>>> ly >>>>> sisEngineFactory_impl.java:94) at >>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource( >>>>> Co >>>>> mpositeResourceFactory_impl.java:62) at >>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27 >>>>> 9) >>>>> at >>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33 >>>>> 1) >>>>> at >>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j >>>>> av >>>>> a:448) at >>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt >>>>> eg >>>>> ratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused >>>>> by: >>>>> org.apache.uima.resource.ResourceInitializationException: MESSAGE >>>>> LOCALIZATION FAILED: Can't find resource for bundle >>>>> java.util.PropertyResourceBundle, key C ould not construct >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic >>>>> ti >>>>> onary at >>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i >>>>> ni >>>>> tialize(AbstractJCasTermAnnotator.java:131) at >>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i >>>>> ni >>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266) >>>>> ... 24 more Caused by: >>>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException >>>>> : >>>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle >>>>> java.util.PropertyResourceBu ndle, key Could not construct >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic >>>>> ti >>>>> onary at >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto >>>>> rP >>>>> arser.parseDictionary(DictionaryDescriptorParser.java:199) >>>>> at >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto >>>>> rP >>>>> arser.parseDictionaries(DictionaryDescriptorParser.java:156) >>>>> at >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto >>>>> rP >>>>> arser.parseDescriptor(DictionaryDescriptorParser.java:128) >>>>> at >>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i >>>>> ni >>>>> tialize(AbstractJCasTermAnnotator.java:129) ... 25 more >>>>> Caused >>>>> by: java.lang.reflect.InvocationTargetException at >>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>> Method) >>>>> at >>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown >>>>> Source) >>>>> at >>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown >>>>> Source) at >>>>> java.lang.reflect.Constructor.newInstance(Unknown >>>>> Source) at >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto >>>>> rP >>>>> arser.parseDictionary(DictionaryDescriptorParser.java:196) >>>>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS >>>>> dictionary sno_rx_16abTerms at >>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic >>>>> ti >>>>> onary.<init>(UmlsJdbcRareWordDictionary.java:29) ... 33 more >>>>> From: James Masanz [mailto:[email protected]] >>>>> Sent: Wednesday, December 13, 2017 8:56 PM >>>>> To: [email protected] >>>>> Subject: Re: Slowness in processing files >>>>> Using AggregatePlaintextFastUMLSProcessor is much faster than >>>>> AggregatePlainTextProcessor, so I suggest that to start with you >>>>> just use AggregatePlaintextFastUMLSProcessor. >>>>> Do you mean it is taking ~5 hours for a single file to be processed >>>>> at times, or is that for a set of files? >>>>> If your JVM heap space is not set large enough, you can get very >>>>> slow results. >>>>> Try increasing to 5G (or more) using the JVM parameter -Xmx5G For >>>>> faster start up, you can also set the -Xms to the same or something >>>>> close to -Xmx value. >>>>> -- James >>>>> On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected] >>>>> >>>>>> >>>>>> wrote: >>>>> Hi All, >>>>> When the medical records are run with the AE as >>>>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor >>>>> the processing is very slow. It is pretty fast when the smaller >>>>> files >>>>> (~2 kb) are fed as input but when I am processing with bigger files >>>>> say, 2Mb, it is very slow and the files are taking ~5 hours to >>>>> process. Any pointer will be of great help. >>>>> Regards, >>>>> Harish. >>>>> >>>>> >>>> >>> >
