Hi James,

Below is the CAS consumer detail:

FileWriterCasConsumer

Descriptor in collection reader:

FilesInDirectoryCollectionReader.xml

The contents of AggregatePlaintextFastUMLSProcessor are not changed and I have 
always used CPE GUI by clear all option. I am not sure of hard drive error 
logs, but will check that as one of the possibilities.

Could you please let me know approximately how much time it took for you to run 
files of sizes ~2Mb (or if you can share any other benchmarks for other file 
sizes you used earlier)

Regards,
Harish.

From: James Masanz [mailto:[email protected]]
Sent: Thursday, December 14, 2017 1:21 PM
To: [email protected]
Subject: Re: Slowness in processing files [EXTERNAL]

sorry, I meant verify that the contents of  the xml file for the fast 
dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor)

On Thu, Dec 14, 2017 at 1:20 PM, James Masanz 
<[email protected]<mailto:[email protected]>> wrote:

Harish,

with the AggregatePlaintextFastUMLSProcessor, it should not be taking that 
long. It sounds like either something outside of cTAKES is having an issue (a 
hard drive starting to fail) or that you are accidentally running 
AggregatePlaintextUMLSProcessor.

I've had issues with the CPE GUI not always behaving well for me.

I suggest when you run the CPE GUI, you use File->Clear all and 
re-enter/re-select what you want.
If that doesn't help, verify that the contents of 
AggregatePlaintextUMLSProcessor haven't been changed.

If none of that helps, as a last resort, I'd look into hard drive error logs.

Also, are you using a  Cas  Consumer? if so, which one.


On Thu, Dec 14, 2017 at 12:04 PM, 
<[email protected]<mailto:[email protected]>> wrote:
If a 2kb file takes about 11 seconds, then a 2mb file is expected to take 
~11*1000 seconds which is about 3 hours (under the assumption that the runtime 
is linear to the file size).

I do not know if the pipeline can be sped up. I would suggest to chunk the file 
into smaller chunks (pieces) and run the pipeline in parallel for each chunk.

Jonas S

Am 14.12.17 um 17:48 schrieb Yadav, Harish:
Hi Timothy,

Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor 
with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for 
a single file of 2 Mb size. It runs fine for 2 Kb file.

Regards,
Harish.

-----Original Message-----
From: Miller, Timothy 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, December 14, 2017 11:22 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Slowness in processing files [EXTERNAL]

You missed the most important part of my message:
Do not try to use AggregatePlainTextProcessor, it is just slow.

Use AggregatePlaintextFastUMLSProcessor

Tim


On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
Hi Timothy,

I fixed the password issues and ran with AE
AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
have checked the memory consumption of the process and it never goes
above 4.5 G, so I am not sure if it is the memory issue. However, AE
AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
most of our files are in Mbs so processing time for each file for more
than 2 hours is not feasible.

Could you please suggest something which may improve the performance.
Below are the logs for the process of 2 Mb file with
AggregatePlainTextProcessor:



Logs:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
"C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
org.apache.uima.tools.cpm.CpmFrame
Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs
at root 0x80000002. Windows
RegCreateKeyEx(...) returned error code 5.
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressAppender] additivity to [false].
log4j: Level value for ProgressAppender is  [INFO].
log4j: ProgressAppender level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m].
log4j: Adding appender named [noEolAppender] to category
[ProgressAppender].
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressDone] additivity to [false].
log4j: Level value for ProgressDone is  [INFO].
log4j: ProgressDone level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m%n].
log4j: Adding appender named [eolAppender] to category [ProgressDone].
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
HH:mm:ss} %5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
org/apache/ctakes/chunker/models/chunker-model.zip
14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB
14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite
state machines loaded.
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
dictionary lookup window type:
org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
VBD VBG VBN VBP VBZ WDT WP WPS WRB
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
term text span: 3
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
Dictionary Descriptor:
org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
dictionary specifications:
14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at
https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
user harish1234:
.14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-<http://urldefense.proofpoint.com/v2/url?u=https-3A__uts->
2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
user harish1234 has been validated

14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
no_rx_16ab/sno_rx_16ab:
14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
..................
14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
and term table CUI_TERMS
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table TUI with class TUI
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table RXNORM with class LONG
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table PREFTERM with class PREFTERM
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
table SNOMEDCT_US with class LONG
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
sizes: 10 , 10
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
org.apache.ctakes.necontexts.status.StatusContextAnalyzer
14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
called for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
org.apache.ctakes.necontexts.status.StatusContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope
sizes: 7 , 7
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
initBoundaryData() called for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
org/apache/ctakes/postagger/models/mayo-pos.zip
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
bin\apache-ctakes-
4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\org\apache\ctakes\lvg\
14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
machines loaded.
14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
analysis? true Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
.....................................................................
...................
Loading configuration.
Loading feature templates.
Loading model:
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
...
<various Loading model>
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
................................
Loading model:
.............................
14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB
14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
process(JCas)
14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
processing
14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
processing
14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
idd_secondTrial.txt
14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
idd_secondTrial.txt







Regards,
Harish.


-----Original Message-----
From: Miller, Timothy 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, December 14, 2017 9:16 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Slowness in processing files [EXTERNAL]

Do not try to use AggregatePlainTextProcessor, it is just slow.
Use the fast version and debug the password issues.
Make sure you have your UMLS credentials set in:
$CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
x_
16ab.xml

in two different places.

Tim



On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:

Hi James,
  Thanks for responding.
  Single file is taking ~5 hours to process with
AggregatePlainTextProcessor of size 2 Mb. This is how the process
looks like for JVM arguments regarding memory:
  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
-Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
"C:\New_Drive\apache-ctakes-4.0.0-bi
apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
4.0.0-
bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
ctakes-
4.0.0\config\log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cpm.CpmFrame
  Also, just now I tried to process the file with AE
  AggregatePlaintextFastUMLSProcessor but ran into different problem
of not getting authentication error with same username password
being used in AggregatePlainTextProcessor.
  I can run it with AggregatePlaintextFastUMLSProcessor by increasing
Xms 5g and Xmx5g,  if you could please let me know how can it be
possible that with one AE AggregatePlainTextProcessor it is running
fine with above username and password but giving below exception
with same username, password with
AggregatePlaintextFastUMLSProcessor.
  Exception:
    C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
-Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
"C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
ctakes-
4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
ctakes-
4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
java.util.prefs.WindowsPreferences <init> WARNING: Could not
open/create prefs root node Software\JavaSoft\Prefs at root
0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
log4j:
attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
2017
21:05:00  INFO TokenizerAnnotatorPTB - Initializing
org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB
 13 Dec 2017
21:05:00
INFO ContextDependentTokenizerAnnotator - Finite state machines
loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using
dictionary lookup window type:
org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
VBP VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO
AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
Descriptor:
org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing
dictionary specifications: 13 Dec 2017 21:05:00  INFO
UmlsUserApprover
- Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
?u=https-3A__uts-
2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
nse.proofpoint.com/v2/url?u=https-3A__uts-<http://nse.proofpoint.com/v2/url?u=https-3A__uts->
2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
up-IbsIg9Q1TPOylpP9FE4GTK-
OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
m.nih.gov/restful/isValidUMLSUser<http://m.nih.gov/restful/isValidUMLSUser> is 
not valid for user XXXXXXX-ß
with XXXXXXX
org.apache.uima.resource.ResourceInitializationException:
Initialization of CAS Processor with name
"AggregatePlaintextFastUMLSProcessor" failed.         at
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
ti
alize(CollectionProcessingEngine_impl.java:81)         at
org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
gE
ngine(UIMAFramework_impl.java:420)         at
org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
AF
ramework.java:918)         at
org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
3)
         at
org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
         at
org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
by: org.apache.uima.resource.ResourceConfigurationException:
Initialization of CAS Processor with name
"AggregatePlaintextFastUMLSProcessor" failed.         at
org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<http://l.cpm.container.CPEFactory.pro>duceInt
eg
ratedCasProcessor(CPEFactory.java:1101)         at
org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
es
sors(CPEFactory.java:547)         at
org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
va
:253)         at
org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
ja
va:127)         at
org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
ti
alize(CollectionProcessingEngine_impl.java:73)         ... 5 more
Caused by:
org.apache.uima.resource.ResourceInitializationException:
Initialization of annotator class
"org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.DefaultJCasTermAnnotator"
failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
ni
tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
         at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
ni
tialize(PrimitiveAnalysisEngine_impl.java:170)         at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
ly
sisEngineFactory_impl.java:94)         at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
Co
mpositeResourceFactory_impl.java:62)         at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
9)
         at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
av
a:407)         at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
va
:256)         at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
ni
tASB(AggregateAnalysisEngine_impl.java:429)         at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
ni
tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
3)
         at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
ni
tialize(AggregateAnalysisEngine_impl.java:186)         at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
ly
sisEngineFactory_impl.java:94)         at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
Co
mpositeResourceFactory_impl.java:62)         at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
9)
         at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
1)
         at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
av
a:448)         at
org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<http://l.cpm.container.CPEFactory.pro>duceInt
eg
ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
by:
org.apache.uima.resource.ResourceInitializationException: MESSAGE
LOCALIZATION FAILED: Can't find resource for bundle
java.util.PropertyResourceBundle, key C ould not construct
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
ti
onary         at
org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.AbstractJCasTermAnnotator.i
ni
tialize(AbstractJCasTermAnnotator.java:131)         at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
ni
tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
         ... 24 more Caused by:
org.apache.uima.analysis_engine.annotator.AnnotatorContextException
:
MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
java.util.PropertyResourceBu ndle, key Could not construct
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
ti
onary         at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDictionary(DictionaryDescriptorParser.java:199)
at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDictionaries(DictionaryDescriptorParser.java:156)
at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDescriptor(DictionaryDescriptorParser.java:128)
at
org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.AbstractJCasTermAnnotator.i
ni
tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
Caused
by: java.lang.reflect.InvocationTargetException         at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
         at
sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
Source)
         at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
Source)         at
java.lang.reflect.Constructor.newInstance(Unknown
Source)         at
org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
rP
arser.parseDictionary(DictionaryDescriptorParser.java:196)
... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS
dictionary sno_rx_16abTerms         at
org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
ti
onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
      From: James Masanz 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, December 13, 2017 8:56 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Slowness in processing files
  Using AggregatePlaintextFastUMLSProcessor  is much faster than
AggregatePlainTextProcessor, so I suggest that to start with you
just use AggregatePlaintextFastUMLSProcessor.
  Do you mean it is taking ~5 hours for a single file to be processed
at times, or is that for a set of files?
  If your JVM heap space is not set large enough, you can get very
slow results.
Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For
faster start up, you can also set the -Xms to the same or something
close to -Xmx value.
    -- James
  On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish 
<[email protected]<mailto:[email protected]>

wrote:
Hi All,
  When the medical records are run with the AE as
AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
the processing is very slow. It is pretty fast when the smaller
files
(~2 kb) are fed as input but when I am processing with bigger files
say, 2Mb, it is very slow and the files are taking ~5 hours to
process. Any pointer will be of great help.
  Regards,
Harish.




Reply via email to