Hi Michael,Thank you for the follow up and the link. I decided to continue to work with the ConceptMapper project since I have made good progress using it. Please keep up posted should you have any documentation for it. Best wishes, Ahmed
On Thu, Jun 19, 2008 at 5:18 AM, Michael Baessler <[EMAIL PROTECTED]> wrote: > Hi Ahmed, > > here is the link of the discussion. > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg01277.html > > There are some minor differences in the capabilities. I think the > discussion will show you the > details and help you to make your decision which component do you need. For > the DictionaryAnnotator > there is a official release with documentation available. You get it with > the Annotator-Addons > package on the UIMA download page. > > -- Michael > > Michael Tanenblatt wrote: > > There is some in-depth discussion about this in the UIMA User mailing > > list--check the archives. The subject line was "Any interest in this as > > an open source project?", and it was from May 2008 or possibly started > > at the end of April. > > > > > > On Jun 18, 2008, at 12:33 PM, Ahmed Abdeen Hamed wrote: > > > >> Thanks for the response. I am still not sure about some aspects of it. I > >> just found out that the UIMA framework has this following > >> DictionaryAnnotator feature: > >> > http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/DictionaryAnnotator/doc/pdf/DictionaryAnnotatorUserGuide.pdf > >> > >> > >> This is similar to what the ConceptMapper doing. Is there any > >> advantage over > >> the DictionaryAnnotator? > >> > >> Thank you! > >> Ahmed > >> > >> On Wed, Jun 18, 2008 at 10:23 AM, Michael Tanenblatt < > >> [EMAIL PROTECTED]> wrote: > >> > >>> My original message regarding this talks some about the dictionary > >>> format. > >>> I am in the process o writing a paper describing the whole of > >>> ConceptMapper, > >>> but that is not yet done. Here is what I wrote before: > >>> > >>> The structure of the dictionary itself is quite flexible. Entries can > >>> have > >>>> any number of variants (synonyms), and arbitrary features can be > >>>> associated > >>>> with dictionary entries. Individual variants inherit features from > >>>> parent > >>>> token (i.e., the canonical from), but can override them or add > >>>> additional > >>>> features. In the following sample dictionary entry, there are 5 > >>>> variants > >>>> of > >>>> the canonical form, and as described earlier, each inherits the > >>>> SemClass > >>>> and POS attributes from the canonical form, with the exception of the > >>>> variant "mesenteric fibromatosis (c48.1)", which overrides the value > of > >>>> the > >>>> SemClass attribute (this is somewhat of a contrived example, just to > >>>> make > >>>> that point): > >>>> <token canonical="abdominal fibromatosis" SemClass="Diagnosis" > >>>> POS="NN"> > >>>> <variant base="abdominal fibromatosis" /> > >>>> <variant base="abdominal desmoid" /> > >>>> <variant base="mesenteric fibromatosis (c48.1)" > >>>> SemClass="Diagnosis-Site" /> > >>>> <variant base="mesenteric fibromatosis" /> > >>>> <variant base="retroperitoneal fibromatosis" /> > >>>> </token> > >>>> > >>> > >>> So, testDict.xml is just an example. Two key AE descriptor parameters > >>> are > >>> "AttributeList" and "FeatureList", which provide the means to map > >>> from the > >>> XML attributes to the target annotation features. If your target > >>> annotation > >>> were called "DictTerm" and the DictTerm had the features > >>> "canonicalForm", > >>> "semanticClass" and "partOfSpeechTag", using the example dictionary > >>> snippet > >>> shown above, you would set AttributeList to: > >>> > >>> DictCanon > >>> SemClass > >>> POS > >>> > >>> and you would set FeatureList to: > >>> > >>> canonicalForm > >>> semanticClass > >>> partOfSpeechTag > >>> > >>> then, when one of the variants is matched in the text, a new DictTerm > >>> would > >>> be created with its semanticClass set to the value of the SemClass > >>> attribute > >>> and its partOfSpeechTag set to the value of the POS attribute. > >>> > >>> One important point: matches are only performed against the strings > >>> listed > >>> as attributes to the "variant" tag's "base" attribute. It is common > >>> practice > >>> to have something like the "token" element with something like a > >>> canonical > >>> form that is the same as one of the variants, but that is not required. > >>> > >>> I hope this helps! > >>> > >>> > >>> > >>> On Jun 18, 2008, at 10:06 AM, Ahmed Abdeen Hamed wrote: > >>> > >>> Thank Michael! I only recently joined the list so I missed the early > >>>> posting. I like this example a lot. I was able to get it to run > >>>> using the > >>>> document analyzer from the uimaj-example. I have some questions > though: > >>>> Is the testDict.xml just an arbitrary xml file which means any > >>>> well-formed > >>>> xml file should work? How do I get my own xml dictionary files to work > >>>> without transforming them into the xml format in your testDict.xml > >>>> file? > >>>> Is > >>>> there documentation for this so that I can understand it on my own > >>>> without > >>>> bugging the entire list?Thanks! > >>>> Ahmed > >>>> > >>>> On Tue, Jun 17, 2008 at 8:05 PM, Michael Tanenblatt < > >>>> [EMAIL PROTECTED]> > >>>> wrote: > >>>> > >>>> As Thilo mentioned in an email from May 19, 2008, I forgot to > >>>> include the > >>>>> source for uima.tt.TokenAnnotation, but otherwise the code should be > >>>>> fine. > >>>>> > >>>>> Additionally, the problem you are seeing is with OffsetTokenizer, > >>>>> which > >>>>> is > >>>>> just a sample tokenizer--if you have another, more robust > >>>>> tokenizer, you > >>>>> don't need this OffsetTokenizer. > >>>>> > >>>>> > >>>>> > >>> > > > >
