Great, I will combine the dictionaries then. It's also good to know that compiled dictionaries makes a difference if it is a bottleneck. Have a good vacation.Ahmed
On Mon, Jun 23, 2008 at 3:06 PM, Michael Tanenblatt <[EMAIL PROTECTED]> wrote: > Yes, CompileDictionary.java will do it. But if dictionary loading time is > not the problem, I wouldn't bother doing that as it will not buy you much. > Combining the dictionaries, for now, should make the biggest difference. > > > On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote: > > Thanks Michael. Dictionaries processing time is reasonable. It's the >> document analyzer execution time that is the bottleneck. I will merge the >> dictionaries and compile them as you suggested. However, I am not sure >> which >> command line tool you are referring to. Do you mean: >> org.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java? >> Thanks for the vacation heads up. >> Ahmed >> >> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt < >> [EMAIL PROTECTED]> >> wrote: >> >> The short answer is "no". Not yet, anyway. >>> >>> But, here are some things that might help. First, if dictionary loading >>> times are long, you can use the command line tool supplied in the package >>> to >>> compile the dictionary, and use the compiled dictionary. If you do this, >>> remember that you will need to change the AE descriptors to use the >>> correct >>> implementation of the dictionary loader, e.g.: >>> >>> <externalResource> >>> ... >>> >>> >>> <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.CompiledDictionaryResource_impl</implementationName> >>> ... >>> </externalResource> >>> >>> That said, if you are using 13 dictionaries, that means you are running >>> 13 >>> copies of ConceptMapper in your pipeline, which means that you are >>> traversing each file's text at 13 times just for your ConceptMapper >>> invocations. If you could merge the dictionaries into one, you should see >>> a >>> marked speedup. Clearly, it a near-term enhancement of ConceptMapper >>> would >>> be to enable the loading of multiple dictionaries, which get merged at >>> initialization time. >>> >>> One side note: I am going to be on vacation starting on June 25 and will >>> only have occasional access to email until I return on July 12. I will >>> try >>> to answer questions during that time when I do have access, but I really >>> have no idea how often that will be. >>> >>> >>> >>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote: >>> >>> Hello UIMA members,I am using the document analyzer example to analyze >>> >>>> large >>>> files from multiple dictionaries. One of the raw files is 7.5MB. The >>>> number >>>> of dictionaries is 13, 1MB is the size of each. Is there some sort of a >>>> matrix that you can use to predict the execution time? Has any one >>>> written >>>> a >>>> paper on the performance analysis of ConceptMapper? >>>> Please let me know if you can. >>>> Best wishes, >>>> -------------------------------------------------------- >>>> Ahmed Abdeen Hamed >>>> Scientific Informatics Project Leader >>>> MBLWHOI Library >>>> Marine Biological Laboratory >>>> 7 MBL Street Woods Hole, MA 02543 USA >>>> +1 508 289 7676 >>>> -- >>>> email: [EMAIL PROTECTED] >>>> -- >>>> >>>> >>> >
