Re: Slowness in processing files [EXTERNAL]

James Masanz Fri, 15 Dec 2017 12:17:26 -0800

I tried an input file of 5.5K and I was surprised to find it took 11
seconds on my laptop.


I'll run a 2MB input file and post results tomorrow. I'll also compare
running from binary vs. running from within an IDE in case the timings are
affected by the size of the jars built for the binary install.

With the 5.5K input file, the annotators taking the most time were
  ConstituencyParser - 39%
  HistoryCleartk - 11%
  PolarityCleartk - 11%
  LVG annotator - 8%
  GenericCleartk - 7.5%

Note the above numbers are from a single run of a single file.

If you're not using the output of any of the annotators that are among the
longer-running ones in your environment (or  any downstream annotators that
depend upon their output), you could consider removing some of them from
your pipeline.

For those not familiar with the CPE Gui, after it processes a set of
documents, it outputs a performance report showing the percentage and
absolute time taken by each annotator in a pipeline.




On Thu, Dec 14, 2017 at 2:15 PM, Yadav, Harish <[email protected]> wrote:

> Hi James,
>
>
>
> Below is the CAS consumer detail:
>
>
>
> FileWriterCasConsumer
>
>
>
> Descriptor in collection reader:
>
>
>
> FilesInDirectoryCollectionReader.xml
>
>
>
> The contents of AggregatePlaintextFastUMLSProcessor are not changed and I
> have always used CPE GUI by clear all option. I am not sure of hard drive
> error logs, but will check that as one of the possibilities.
>
>
>
> Could you please let me know approximately how much time it took for you
> to run files of sizes ~2Mb (or if you can share any other benchmarks for
> other file sizes you used earlier)
>
>
>
> Regards,
>
> Harish.
>
>
>
> *From:* James Masanz [mailto:[email protected]]
> *Sent:* Thursday, December 14, 2017 1:21 PM
> *To:* [email protected]
> *Subject:* Re: Slowness in processing files [EXTERNAL]
>
>
>
> sorry, I meant verify that the contents of  the xml file for the fast
> dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)
>
>
>
> On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <[email protected]>
> wrote:
>
>
>
> Harish,
>
>
>
> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
> that long. It sounds like either something outside of cTAKES is having an
> issue (a hard drive starting to fail) or that you are accidentally running
> AggregatePlaintextUMLSProcessor.
>
>
> I've had issues with the CPE GUI not always behaving well for me.
>
>
>
> I suggest when you run the CPE GUI, you use File->Clear all and
> re-enter/re-select what you want.
>
> If that doesn't help, verify that the contents of
> AggregatePlaintextUMLSProcessor haven't been changed.
>
>
> If none of that helps, as a last resort, I'd look into hard drive error
> logs.
>
>
>
> Also, are you using a  Cas  Consumer? if so, which one.
>
>
>
>
>
> On Thu, Dec 14, 2017 at 12:04 PM, <[email protected]> wrote:
>
> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
> ~11*1000 seconds which is about 3 hours (under the assumption that the
> runtime is linear to the file size).
>
> I do not know if the pipeline can be sped up. I would suggest to chunk the
> file into smaller chunks (pieces) and run the pipeline in parallel for each
> chunk.
>
> Jonas S
>
> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>
> Hi Timothy,
>
> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>
> Regards,
> Harish.
>
> -----Original Message-----
> From: Miller, Timothy [mailto:[email protected]]
> Sent: Thursday, December 14, 2017 11:22 AM
> To: [email protected]
> Subject: Re: Slowness in processing files [EXTERNAL]
>
> You missed the most important part of my message:
>
> Do not try to use AggregatePlainTextProcessor, it is just slow.
>
>
> Use AggregatePlaintextFastUMLSProcessor
>
> Tim
>
>
> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>
> Hi Timothy,
>
> I fixed the password issues and ran with AE
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
> have checked the memory consumption of the process and it never goes
> above 4.5 G, so I am not sure if it is the memory issue. However, AE
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
> most of our files are in Mbs so processing time for each file for more
> than 2 hours is not feasible.
>
> Could you please suggest something which may improve the performance.
> Below are the logs for the process of 2 Mb file with
> AggregatePlainTextProcessor:
>
>
>
> Logs:
>
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
> org.apache.uima.tools.cpm.CpmFrame
> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
> at root 0x80000002. Windows
> RegCreateKeyEx(...) returned error code 5.
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
> HH:mm:ss} %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
> state machines loaded.
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
> VBD VBG VBN VBP VBZ WDT WP WPS WRB
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
> term text span: 3
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> Dictionary Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications:
> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234:
> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234 has been validated
>
> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
> no_rx_16ab/sno_rx_16ab:
> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
> ..................
> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
> and term table CUI_TERMS
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table TUI with class TUI
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table RXNORM with class LONG
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table PREFTERM with class PREFTERM
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table SNOMEDCT_US with class LONG
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
> sizes: 10 , 10
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
> called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
> sizes: 7 , 7
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
> initBoundaryData() called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\
> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
> machines loaded.
> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
> analysis? true Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> .....................................................................
> ...................
> Loading configuration.
> Loading feature templates.
> Loading model:
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ...
> <various Loading model>
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ................................
> Loading model:
> .............................
> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
> process(JCas)
> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
> processing
> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
> processing
> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
> idd_secondTrial.txt
> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
> idd_secondTrial.txt
>
>
>
>
>
>
>
> Regards,
> Harish.
>
>
> -----Original Message-----
> From: Miller, Timothy [mailto:[email protected]]
> Sent: Thursday, December 14, 2017 9:16 AM
> To: [email protected]
> Subject: Re: Slowness in processing files [EXTERNAL]
>
> Do not try to use AggregatePlainTextProcessor, it is just slow.
> Use the fast version and debug the password issues.
> Make sure you have your UMLS credentials set in:
> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
> x_
> 16ab.xml
>
> in two different places.
>
> Tim
>
>
>
> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>
>
> Hi James,
>   Thanks for responding.
>   Single file is taking ~5 hours to process with
> AggregatePlainTextProcessor of size 2 Mb. This is how the process
> looks like for JVM arguments regarding memory:
>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bi
> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
> 4.0.0-
> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> ctakes-
> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame
>   Also, just now I tried to process the file with AE
>   AggregatePlaintextFastUMLSProcessor but ran into different problem
> of not getting authentication error with same username password
> being used in AggregatePlainTextProcessor.
>   I can run it with AggregatePlaintextFastUMLSProcessor by increasing
> Xms 5g and Xmx5g,  if you could please let me know how can it be
> possible that with one AE AggregatePlainTextProcessor it is running
> fine with above username and password but giving below exception
> with same username, password with
> AggregatePlaintextFastUMLSProcessor.
>   Exception:
>     C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> ctakes-
> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
> ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
> java.util.prefs.WindowsPreferences <init> WARNING: Could not
> open/create prefs root node Software\JavaSoft\Prefs at root
> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
> log4j:
> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
> 2017
> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
> 21:05:00
> INFO ContextDependentTokenizerAnnotator - Finite state machines
> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications: 13 Dec 2017 21:05:00  INFO
> UmlsUserApprover
> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
> ?u=https-3A__uts-
> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
> nse.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
> up-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> with XXXXXXX
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> ti
> alize(CollectionProcessingEngine_impl.java:81)         at
> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
> gE
> ngine(UIMAFramework_impl.java:420)         at
> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
> AF
> ramework.java:918)         at
> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
> 3)
>          at
> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>          at
> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> by: org.apache.uima.resource.ResourceConfigurationException:
> Initialization of CAS Processor with name
> "AggregatePlaintextFastUMLSProcessor" failed.         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> eg
> ratedCasProcessor(CPEFactory.java:1101)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
> es
> sors(CPEFactory.java:547)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
> va
> :253)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
> ja
> va:127)         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> ti
> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
> Caused by:
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of annotator class
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> ni
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>          at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> ni
> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> ly
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> 9)
>          at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> av
> a:407)         at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
> va
> :256)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> ni
> tASB(AggregateAnalysisEngine_impl.java:429)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> ni
> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
> 3)
>          at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> ni
> tialize(AggregateAnalysisEngine_impl.java:186)         at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> ly
> sisEngineFactory_impl.java:94)         at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> 9)
>          at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
> 1)
>          at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> av
> a:448)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> eg
> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
> by:
> org.apache.uima.resource.ResourceInitializationException: MESSAGE
> LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBundle, key C ould not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> ti
> onary         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> ni
> tialize(AbstractJCasTermAnnotator.java:131)         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> ni
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>          ... 24 more Caused by:
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
> :
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
> java.util.PropertyResourceBu ndle, key Could not construct
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> ti
> onary         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDictionary(DictionaryDescriptorParser.java:199)
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
> at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> ni
> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
> Caused
> by: java.lang.reflect.InvocationTargetException         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>          at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> Source)
>          at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)         at
> java.lang.reflect.Constructor.newInstance(Unknown
> Source)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> rP
> arser.parseDictionary(DictionaryDescriptorParser.java:196)
> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
> dictionary sno_rx_16abTerms         at
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> ti
> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>       From: James Masanz [mailto:[email protected]]
> Sent: Wednesday, December 13, 2017 8:56 PM
> To: [email protected]
> Subject: Re: Slowness in processing files
>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
> AggregatePlainTextProcessor, so I suggest that to start with you
> just use AggregatePlaintextFastUMLSProcessor.
>   Do you mean it is taking ~5 hours for a single file to be processed
> at times, or is that for a set of files?
>   If your JVM heap space is not set large enough, you can get very
> slow results.
> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
> faster start up, you can also set the -Xms to the same or something
> close to -Xmx value.
>     -- James
>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]
>
>
>
> wrote:
> Hi All,
>   When the medical records are run with the AE as
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> the processing is very slow. It is pretty fast when the smaller
> files
> (~2 kb) are fed as input but when I am processing with bigger files
> say, 2Mb, it is very slow and the files are taking ~5 hours to
> process. Any pointer will be of great help.
>   Regards,
> Harish.
>
>
>
>
>
>
>
>

Re: Slowness in processing files [EXTERNAL]

Reply via email to