Hi Ahmed, here is the link of the discussion.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01277.html There are some minor differences in the capabilities. I think the discussion will show you the details and help you to make your decision which component do you need. For the DictionaryAnnotator there is a official release with documentation available. You get it with the Annotator-Addons package on the UIMA download page. -- Michael Michael Tanenblatt wrote: > There is some in-depth discussion about this in the UIMA User mailing > list--check the archives. The subject line was "Any interest in this as > an open source project?", and it was from May 2008 or possibly started > at the end of April. > > > On Jun 18, 2008, at 12:33 PM, Ahmed Abdeen Hamed wrote: > >> Thanks for the response. I am still not sure about some aspects of it. I >> just found out that the UIMA framework has this following >> DictionaryAnnotator feature: >> http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/DictionaryAnnotator/doc/pdf/DictionaryAnnotatorUserGuide.pdf >> >> >> This is similar to what the ConceptMapper doing. Is there any >> advantage over >> the DictionaryAnnotator? >> >> Thank you! >> Ahmed >> >> On Wed, Jun 18, 2008 at 10:23 AM, Michael Tanenblatt < >> [EMAIL PROTECTED]> wrote: >> >>> My original message regarding this talks some about the dictionary >>> format. >>> I am in the process o writing a paper describing the whole of >>> ConceptMapper, >>> but that is not yet done. Here is what I wrote before: >>> >>> The structure of the dictionary itself is quite flexible. Entries can >>> have >>>> any number of variants (synonyms), and arbitrary features can be >>>> associated >>>> with dictionary entries. Individual variants inherit features from >>>> parent >>>> token (i.e., the canonical from), but can override them or add >>>> additional >>>> features. In the following sample dictionary entry, there are 5 >>>> variants >>>> of >>>> the canonical form, and as described earlier, each inherits the >>>> SemClass >>>> and POS attributes from the canonical form, with the exception of the >>>> variant "mesenteric fibromatosis (c48.1)", which overrides the value of >>>> the >>>> SemClass attribute (this is somewhat of a contrived example, just to >>>> make >>>> that point): >>>> <token canonical="abdominal fibromatosis" SemClass="Diagnosis" >>>> POS="NN"> >>>> <variant base="abdominal fibromatosis" /> >>>> <variant base="abdominal desmoid" /> >>>> <variant base="mesenteric fibromatosis (c48.1)" >>>> SemClass="Diagnosis-Site" /> >>>> <variant base="mesenteric fibromatosis" /> >>>> <variant base="retroperitoneal fibromatosis" /> >>>> </token> >>>> >>> >>> So, testDict.xml is just an example. Two key AE descriptor parameters >>> are >>> "AttributeList" and "FeatureList", which provide the means to map >>> from the >>> XML attributes to the target annotation features. If your target >>> annotation >>> were called "DictTerm" and the DictTerm had the features >>> "canonicalForm", >>> "semanticClass" and "partOfSpeechTag", using the example dictionary >>> snippet >>> shown above, you would set AttributeList to: >>> >>> DictCanon >>> SemClass >>> POS >>> >>> and you would set FeatureList to: >>> >>> canonicalForm >>> semanticClass >>> partOfSpeechTag >>> >>> then, when one of the variants is matched in the text, a new DictTerm >>> would >>> be created with its semanticClass set to the value of the SemClass >>> attribute >>> and its partOfSpeechTag set to the value of the POS attribute. >>> >>> One important point: matches are only performed against the strings >>> listed >>> as attributes to the "variant" tag's "base" attribute. It is common >>> practice >>> to have something like the "token" element with something like a >>> canonical >>> form that is the same as one of the variants, but that is not required. >>> >>> I hope this helps! >>> >>> >>> >>> On Jun 18, 2008, at 10:06 AM, Ahmed Abdeen Hamed wrote: >>> >>> Thank Michael! I only recently joined the list so I missed the early >>>> posting. I like this example a lot. I was able to get it to run >>>> using the >>>> document analyzer from the uimaj-example. I have some questions though: >>>> Is the testDict.xml just an arbitrary xml file which means any >>>> well-formed >>>> xml file should work? How do I get my own xml dictionary files to work >>>> without transforming them into the xml format in your testDict.xml >>>> file? >>>> Is >>>> there documentation for this so that I can understand it on my own >>>> without >>>> bugging the entire list?Thanks! >>>> Ahmed >>>> >>>> On Tue, Jun 17, 2008 at 8:05 PM, Michael Tanenblatt < >>>> [EMAIL PROTECTED]> >>>> wrote: >>>> >>>> As Thilo mentioned in an email from May 19, 2008, I forgot to >>>> include the >>>>> source for uima.tt.TokenAnnotation, but otherwise the code should be >>>>> fine. >>>>> >>>>> Additionally, the problem you are seeing is with OffsetTokenizer, >>>>> which >>>>> is >>>>> just a sample tokenizer--if you have another, more robust >>>>> tokenizer, you >>>>> don't need this OffsetTokenizer. >>>>> >>>>> >>>>> >>> >
