Yes, CompileDictionary.java will do it. But if dictionary loading time
is not the problem, I wouldn't bother doing that as it will not buy
you much. Combining the dictionaries, for now, should make the biggest
difference.
On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:
Thanks Michael. Dictionaries processing time is reasonable. It's the
document analyzer execution time that is the bottleneck. I will
merge the
dictionaries and compile them as you suggested. However, I am not
sure which
command line tool you are referring to. Do you mean:
org
.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
Thanks for the vacation heads up.
Ahmed
On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <[EMAIL PROTECTED]
>
wrote:
The short answer is "no". Not yet, anyway.
But, here are some things that might help. First, if dictionary
loading
times are long, you can use the command line tool supplied in the
package to
compile the dictionary, and use the compiled dictionary. If you do
this,
remember that you will need to change the AE descriptors to use the
correct
implementation of the dictionary loader, e.g.:
<externalResource>
...
<
implementationName
>
org
.apache
.uima
.conceptMapper
.support.dictionaryResource.CompiledDictionaryResource_impl</
implementationName>
...
</externalResource>
That said, if you are using 13 dictionaries, that means you are
running 13
copies of ConceptMapper in your pipeline, which means that you are
traversing each file's text at 13 times just for your ConceptMapper
invocations. If you could merge the dictionaries into one, you
should see a
marked speedup. Clearly, it a near-term enhancement of
ConceptMapper would
be to enable the loading of multiple dictionaries, which get merged
at
initialization time.
One side note: I am going to be on vacation starting on June 25 and
will
only have occasional access to email until I return on July 12. I
will try
to answer questions during that time when I do have access, but I
really
have no idea how often that will be.
On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
Hello UIMA members,I am using the document analyzer example to
analyze
large
files from multiple dictionaries. One of the raw files is 7.5MB. The
number
of dictionaries is 13, 1MB is the size of each. Is there some sort
of a
matrix that you can use to predict the execution time? Has any one
written
a
paper on the performance analysis of ConceptMapper?
Please let me know if you can.
Best wishes,
--------------------------------------------------------
Ahmed Abdeen Hamed
Scientific Informatics Project Leader
MBLWHOI Library
Marine Biological Laboratory
7 MBL Street Woods Hole, MA 02543 USA
+1 508 289 7676
--
email: [EMAIL PROTECTED]
--