Re: Slowness in processing files [EXTERNAL]

James Masanz Thu, 14 Dec 2017 10:21:44 -0800

sorry, I meant verify that the contents of  the xml file for the fast
dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)


On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <[email protected]>
wrote:

>
> Harish,
>
> with the AggregatePlaintextFastUMLSProcessor, it should not be taking
> that long. It sounds like either something outside of cTAKES is having an
> issue (a hard drive starting to fail) or that you are accidentally running
> AggregatePlaintextUMLSProcessor.
>
> I've had issues with the CPE GUI not always behaving well for me.
>
> I suggest when you run the CPE GUI, you use File->Clear all and
> re-enter/re-select what you want.
> If that doesn't help, verify that the contents of
> AggregatePlaintextUMLSProcessor haven't been changed.
>
> If none of that helps, as a last resort, I'd look into hard drive error
> logs.
>
> Also, are you using a  Cas  Consumer? if so, which one.
>
>
> On Thu, Dec 14, 2017 at 12:04 PM, <[email protected]> wrote:
>
>> If a 2kb file takes about 11 seconds, then a 2mb file is expected to take
>> ~11*1000 seconds which is about 3 hours (under the assumption that the
>> runtime is linear to the file size).
>>
>> I do not know if the pipeline can be sped up. I would suggest to chunk
>> the file into smaller chunks (pieces) and run the pipeline in parallel for
>> each chunk.
>>
>> Jonas S
>>
>> Am 14.12.17 um 17:48 schrieb Yadav, Harish:
>>
>>> Hi Timothy,
>>>
>>> Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor
>>> with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours)
>>> for a single file of 2 Mb size. It runs fine for 2 Kb file.
>>>
>>> Regards,
>>> Harish.
>>>
>>> -----Original Message-----
>>> From: Miller, Timothy [mailto:[email protected]]
>>> Sent: Thursday, December 14, 2017 11:22 AM
>>> To: [email protected]
>>> Subject: Re: Slowness in processing files [EXTERNAL]
>>>
>>> You missed the most important part of my message:
>>>
>>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>>>
>>>
>>> Use AggregatePlaintextFastUMLSProcessor
>>>
>>> Tim
>>>
>>>
>>> On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
>>>
>>>> Hi Timothy,
>>>>
>>>> I fixed the password issues and ran with AE
>>>> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
>>>> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
>>>> have checked the memory consumption of the process and it never goes
>>>> above 4.5 G, so I am not sure if it is the memory issue. However, AE
>>>> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
>>>> most of our files are in Mbs so processing time for each file for more
>>>> than 2 hours is not feasible.
>>>>
>>>> Could you please suggest something which may improve the performance.
>>>> Below are the logs for the process of 2 Mb file with
>>>> AggregatePlainTextProcessor:
>>>>
>>>>
>>>>
>>>> Logs:
>>>>
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
>>>> org.apache.uima.tools.cpm.CpmFrame
>>>> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
>>>> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
>>>> at root 0x80000002. Windows
>>>> RegCreateKeyEx(...) returned error code 5.
>>>> log4j: reset attribute= "false".
>>>> log4j: Threshold ="null".
>>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>>> log4j: Setting [ProgressAppender] additivity to [false].
>>>> log4j: Level value for ProgressAppender is  [INFO].
>>>> log4j: ProgressAppender level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPattern] to [%m].
>>>> log4j: Adding appender named [noEolAppender] to category
>>>> [ProgressAppender].
>>>> log4j: Retreiving an instance of org.apache.log4j.Logger.
>>>> log4j: Setting [ProgressDone] additivity to [false].
>>>> log4j: Level value for ProgressDone is  [INFO].
>>>> log4j: ProgressDone level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPattern] to [%m%n].
>>>> log4j: Adding appender named [eolAppender] to category [ProgressDone].
>>>> log4j: Level value for root is  [INFO].
>>>> log4j: root level set to INFO
>>>> log4j: Class name: [org.apache.log4j.ConsoleAppender]
>>>> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
>>>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
>>>> HH:mm:ss} %5p %c{1} - %m%n].
>>>> log4j: Adding appender named [consoleAppender] to category [root].
>>>> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
>>>> org/apache/ctakes/chunker/models/chunker-model.zip
>>>> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>>> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
>>>> state machines loaded.
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>>>> dictionary lookup window type:
>>>> org.apache.ctakes.typesystem.type.textspan.Sentence
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
>>>> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
>>>> VBD VBG VBN VBP VBZ WDT WP WPS WRB
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
>>>> term text span: 3
>>>> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
>>>> Dictionary Descriptor:
>>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>>> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
>>>> dictionary specifications:
>>>> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
>>>> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
>>>> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>>>> user harish1234:
>>>> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
>>>> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
>>>> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
>>>> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
>>>> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
>>>> user harish1234 has been validated
>>>>
>>>> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
>>>> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
>>>> no_rx_16ab/sno_rx_16ab:
>>>> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
>>>> ..................
>>>> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
>>>> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
>>>> and term table CUI_TERMS
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table TUI with class TUI
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table RXNORM with class LONG
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table PREFTERM with class PREFTERM
>>>> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
>>>> table SNOMEDCT_US with class LONG
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>>>> sizes: 10 , 10
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>>>> LEFT,RIGHT
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>>>> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
>>>> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
>>>> called for ContextInitializer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>>>> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>>>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
>>>> sizes: 7 , 7
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
>>>> LEFT,RIGHT
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
>>>> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
>>>> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
>>>> initBoundaryData() called for ContextInitializer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
>>>> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
>>>> type: org.apache.ctakes.typesystem.type.textspan.Sentence
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
>>>> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
>>>> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
>>>> org.apache.ctakes.typesystem.type.syntax.BaseToken
>>>> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
>>>> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
>>>> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
>>>> org/apache/ctakes/postagger/models/mayo-pos.zip
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
>>>> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
>>>> bin\apache-ctakes-
>>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
>>>> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>> 4.0.0\resources\org\apache\ctakes\lvg\
>>>> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
>>>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
>>>> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
>>>> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
>>>> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
>>>> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
>>>> machines loaded.
>>>> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
>>>> analysis? true Loading configuration.
>>>> Loading feature templates.
>>>> Loading lexica.
>>>> Loading model:
>>>> .....................................................................
>>>> ...................
>>>> Loading configuration.
>>>> Loading feature templates.
>>>> Loading model:
>>>> .
>>>> Loading configuration.
>>>> Loading feature templates.
>>>> Loading lexica.
>>>> Loading model:
>>>> ...
>>>> <various Loading model>
>>>> .
>>>> Loading configuration.
>>>> Loading feature templates.
>>>> Loading lexica.
>>>> Loading model:
>>>> ................................
>>>> Loading model:
>>>> .............................
>>>> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
>>>> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
>>>> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
>>>> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
>>>> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
>>>> process(JCas)
>>>> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
>>>> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
>>>> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
>>>> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
>>>> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
>>>> processing
>>>> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
>>>> processing
>>>> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
>>>> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
>>>> idd_secondTrial.txt
>>>> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
>>>> idd_secondTrial.txt
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Harish.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Miller, Timothy [mailto:[email protected]]
>>>> Sent: Thursday, December 14, 2017 9:16 AM
>>>> To: [email protected]
>>>> Subject: Re: Slowness in processing files [EXTERNAL]
>>>>
>>>> Do not try to use AggregatePlainTextProcessor, it is just slow.
>>>> Use the fast version and debug the password issues.
>>>> Make sure you have your UMLS credentials set in:
>>>> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
>>>> x_
>>>> 16ab.xml
>>>>
>>>> in two different places.
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
>>>>
>>>>>
>>>>> Hi James,
>>>>>   Thanks for responding.
>>>>>   Single file is taking ~5 hours to process with
>>>>> AggregatePlainTextProcessor of size 2 Mb. This is how the process
>>>>> looks like for JVM arguments regarding memory:
>>>>>   C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>>>> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp
>>>>> "C:\New_Drive\apache-ctakes-4.0.0-bi
>>>>> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
>>>>> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
>>>>> 4.0.0-
>>>>> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
>>>>> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>>>> ctakes-
>>>>> 4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>>>> org.apache.uima.tools.cpm.CpmFrame
>>>>>   Also, just now I tried to process the file with AE
>>>>>   AggregatePlaintextFastUMLSProcessor but ran into different problem
>>>>> of not getting authentication error with same username password
>>>>> being used in AggregatePlainTextProcessor.
>>>>>   I can run it with AggregatePlaintextFastUMLSProcessor by increasing
>>>>> Xms 5g and Xmx5g,  if you could please let me know how can it be
>>>>> possible that with one AE AggregatePlainTextProcessor it is running
>>>>> fine with above username and password but giving below exception
>>>>> with same username, password with
>>>>> AggregatePlaintextFastUMLSProcessor.
>>>>>   Exception:
>>>>>     C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
>>>>> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp
>>>>> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
>>>>> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
>>>>> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
>>>>> ctakes-
>>>>> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
>>>>> ctakes-
>>>>> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
>>>>> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
>>>>> java.util.prefs.WindowsPreferences <init> WARNING: Could not
>>>>> open/create prefs root node Software\JavaSoft\Prefs at root
>>>>> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
>>>>> log4j:
>>>>> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
>>>>> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
>>>>> 2017
>>>>> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing
>>>>> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
>>>>> 21:05:00
>>>>> INFO ContextDependentTokenizerAnnotator - Finite state machines
>>>>> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
>>>>> dictionary lookup window type:
>>>>> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
>>>>> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
>>>>> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
>>>>> VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
>>>>> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
>>>>> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
>>>>> Descriptor:
>>>>> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
>>>>> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
>>>>> dictionary specifications: 13 Dec 2017 21:05:00  INFO
>>>>> UmlsUserApprover
>>>>> - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
>>>>> ?u=https-3A__uts-
>>>>> 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>>>> eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
>>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>>>> vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
>>>>> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
>>>>> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
>>>>> nse.proofpoint.com/v2/url?u=https-3A__uts-
>>>>> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
>>>>> up-IbsIg9Q1TPOylpP9FE4GTK-
>>>>> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
>>>>> vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
>>>>> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
>>>>> with XXXXXXX
>>>>> org.apache.uima.resource.ResourceInitializationException:
>>>>> Initialization of CAS Processor with name
>>>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>>> ti
>>>>> alize(CollectionProcessingEngine_impl.java:81)         at
>>>>> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
>>>>> gE
>>>>> ngine(UIMAFramework_impl.java:420)         at
>>>>> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
>>>>> AF
>>>>> ramework.java:918)         at
>>>>> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
>>>>> 3)
>>>>>          at
>>>>> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>>>>>          at
>>>>> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
>>>>> by: org.apache.uima.resource.ResourceConfigurationException:
>>>>> Initialization of CAS Processor with name
>>>>> "AggregatePlaintextFastUMLSProcessor" failed.         at
>>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>>> eg
>>>>> ratedCasProcessor(CPEFactory.java:1101)         at
>>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
>>>>> es
>>>>> sors(CPEFactory.java:547)         at
>>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
>>>>> va
>>>>> :253)         at
>>>>> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
>>>>> ja
>>>>> va:127)         at
>>>>> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
>>>>> ti
>>>>> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
>>>>> Caused by:
>>>>> org.apache.uima.resource.ResourceInitializationException:
>>>>> Initialization of annotator class
>>>>> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator"
>>>>> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
>>>>> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
>>>>> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>>> ni
>>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>>>>>          at
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>>> ni
>>>>> tialize(PrimitiveAnalysisEngine_impl.java:170)         at
>>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>>> ly
>>>>> sisEngineFactory_impl.java:94)         at
>>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>>> Co
>>>>> mpositeResourceFactory_impl.java:62)         at
>>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>>> 9)
>>>>>          at
>>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>>> av
>>>>> a:407)         at
>>>>> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
>>>>> va
>>>>> :256)         at
>>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>>> ni
>>>>> tASB(AggregateAnalysisEngine_impl.java:429)         at
>>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>>> ni
>>>>> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
>>>>> 3)
>>>>>          at
>>>>> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
>>>>> ni
>>>>> tialize(AggregateAnalysisEngine_impl.java:186)         at
>>>>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
>>>>> ly
>>>>> sisEngineFactory_impl.java:94)         at
>>>>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
>>>>> Co
>>>>> mpositeResourceFactory_impl.java:62)         at
>>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
>>>>> 9)
>>>>>          at
>>>>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
>>>>> 1)
>>>>>          at
>>>>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
>>>>> av
>>>>> a:448)         at
>>>>> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
>>>>> eg
>>>>> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
>>>>> by:
>>>>> org.apache.uima.resource.ResourceInitializationException: MESSAGE
>>>>> LOCALIZATION FAILED: Can't find resource for bundle
>>>>> java.util.PropertyResourceBundle, key C ould not construct
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>>> ti
>>>>> onary         at
>>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>>> ni
>>>>> tialize(AbstractJCasTermAnnotator.java:131)         at
>>>>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
>>>>> ni
>>>>> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>>>>>          ... 24 more Caused by:
>>>>> org.apache.uima.analysis_engine.annotator.AnnotatorContextException
>>>>> :
>>>>> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
>>>>> java.util.PropertyResourceBu ndle, key Could not construct
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>>> ti
>>>>> onary         at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDictionary(DictionaryDescriptorParser.java:199)
>>>>> at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
>>>>> at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDescriptor(DictionaryDescriptorParser.java:128)
>>>>> at
>>>>> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
>>>>> ni
>>>>> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
>>>>> Caused
>>>>> by: java.lang.reflect.InvocationTargetException         at
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>>          at
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>>>>> Source)
>>>>>          at
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>>>>> Source)         at
>>>>> java.lang.reflect.Constructor.newInstance(Unknown
>>>>> Source)         at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
>>>>> rP
>>>>> arser.parseDictionary(DictionaryDescriptorParser.java:196)
>>>>> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
>>>>> dictionary sno_rx_16abTerms         at
>>>>> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
>>>>> ti
>>>>> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>>>>>       From: James Masanz [mailto:[email protected]]
>>>>> Sent: Wednesday, December 13, 2017 8:56 PM
>>>>> To: [email protected]
>>>>> Subject: Re: Slowness in processing files
>>>>>   Using AggregatePlaintextFastUMLSProcessor  is much faster than
>>>>> AggregatePlainTextProcessor, so I suggest that to start with you
>>>>> just use AggregatePlaintextFastUMLSProcessor.
>>>>>   Do you mean it is taking ~5 hours for a single file to be processed
>>>>> at times, or is that for a set of files?
>>>>>   If your JVM heap space is not set large enough, you can get very
>>>>> slow results.
>>>>> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
>>>>> faster start up, you can also set the -Xms to the same or something
>>>>> close to -Xmx value.
>>>>>     -- James
>>>>>   On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]
>>>>>
>>>>>>
>>>>>> wrote:
>>>>> Hi All,
>>>>>   When the medical records are run with the AE as
>>>>> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
>>>>> the processing is very slow. It is pretty fast when the smaller
>>>>> files
>>>>> (~2 kb) are fed as input but when I am processing with bigger files
>>>>> say, 2Mb, it is very slow and the files are taking ~5 hours to
>>>>> process. Any pointer will be of great help.
>>>>>   Regards,
>>>>> Harish.
>>>>>
>>>>>
>>>>
>>>
>

Re: Slowness in processing files [EXTERNAL]

Reply via email to