Re: [jira] Commented: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

Ahmed Abdeen Hamed Thu, 19 Jun 2008 06:15:10 -0700

Hi Michael,Thank you for the follow up and the link. I decided to continue
to work with the ConceptMapper project since I have made good progress using
it. Please keep up posted should you have any documentation for it.
Best wishes,
Ahmed


On Thu, Jun 19, 2008 at 5:18 AM, Michael Baessler <[EMAIL PROTECTED]>
wrote:

> Hi Ahmed,
>
> here is the link of the discussion.
>
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg01277.html
>
> There are some minor differences in the capabilities. I think the
> discussion will show you the
> details and help you to make your decision which component do you need. For
> the DictionaryAnnotator
> there is a official release with documentation available. You get it with
> the Annotator-Addons
> package on the UIMA download page.
>
> -- Michael
>
> Michael Tanenblatt wrote:
> > There is some in-depth discussion about this in the UIMA User mailing
> > list--check the archives. The subject line was "Any interest in this as
> > an open source project?", and it was from May 2008 or possibly started
> > at the end of April.
> >
> >
> > On Jun 18, 2008, at 12:33 PM, Ahmed Abdeen Hamed wrote:
> >
> >> Thanks for the response. I am still not sure about some aspects of it. I
> >> just found out that the UIMA framework has this following
> >> DictionaryAnnotator feature:
> >>
> http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/DictionaryAnnotator/doc/pdf/DictionaryAnnotatorUserGuide.pdf
> >>
> >>
> >> This is similar to what the ConceptMapper doing. Is there any
> >> advantage over
> >> the DictionaryAnnotator?
> >>
> >> Thank you!
> >> Ahmed
> >>
> >> On Wed, Jun 18, 2008 at 10:23 AM, Michael Tanenblatt <
> >> [EMAIL PROTECTED]> wrote:
> >>
> >>> My original message regarding this talks some about the dictionary
> >>> format.
> >>> I am in the process o writing a paper describing the whole of
> >>> ConceptMapper,
> >>> but that is not yet done. Here is what I wrote before:
> >>>
> >>> The structure of the dictionary itself is quite flexible. Entries can
> >>> have
> >>>> any number of variants (synonyms), and arbitrary features can be
> >>>> associated
> >>>> with dictionary entries. Individual variants inherit features from
> >>>> parent
> >>>> token (i.e., the canonical from), but can override them or add
> >>>> additional
> >>>> features. In the following sample dictionary entry, there are 5
> >>>> variants
> >>>> of
> >>>> the canonical form, and as described earlier, each inherits the
> >>>> SemClass
> >>>> and POS attributes from the canonical form, with the exception of the
> >>>> variant "mesenteric fibromatosis (c48.1)", which overrides the value
> of
> >>>> the
> >>>> SemClass attribute (this is somewhat of a contrived example, just to
> >>>> make
> >>>> that point):
> >>>> <token canonical="abdominal fibromatosis" SemClass="Diagnosis"
> >>>> POS="NN">
> >>>> <variant base="abdominal fibromatosis" />
> >>>> <variant base="abdominal desmoid" />
> >>>> <variant base="mesenteric fibromatosis (c48.1)"
> >>>> SemClass="Diagnosis-Site" />
> >>>> <variant base="mesenteric fibromatosis" />
> >>>> <variant base="retroperitoneal fibromatosis" />
> >>>> </token>
> >>>>
> >>>
> >>> So, testDict.xml is just an example. Two key AE descriptor parameters
> >>> are
> >>> "AttributeList" and "FeatureList", which provide the means to map
> >>> from the
> >>> XML attributes to the target annotation features. If your target
> >>> annotation
> >>> were called "DictTerm" and the DictTerm had the features
> >>> "canonicalForm",
> >>> "semanticClass" and "partOfSpeechTag", using the example dictionary
> >>> snippet
> >>> shown above, you would set AttributeList to:
> >>>
> >>>       DictCanon
> >>>       SemClass
> >>>       POS
> >>>
> >>> and you would set FeatureList to:
> >>>
> >>>       canonicalForm
> >>>       semanticClass
> >>>       partOfSpeechTag
> >>>
> >>> then, when one of the variants is matched in the text, a new DictTerm
> >>> would
> >>> be created with its semanticClass set to the value of the SemClass
> >>> attribute
> >>> and its partOfSpeechTag set to the value of the POS attribute.
> >>>
> >>> One important point: matches are only performed against the strings
> >>> listed
> >>> as attributes to the "variant" tag's "base" attribute. It is common
> >>> practice
> >>> to have something like the "token" element with something like a
> >>> canonical
> >>> form that is the same as one of the variants, but that is not required.
> >>>
> >>> I hope this helps!
> >>>
> >>>
> >>>
> >>> On Jun 18, 2008, at 10:06 AM, Ahmed Abdeen Hamed wrote:
> >>>
> >>> Thank Michael! I only recently joined the list so I missed the early
> >>>> posting. I like this example a lot. I was able to get it to run
> >>>> using the
> >>>> document analyzer from the uimaj-example. I have some questions
> though:
> >>>> Is the testDict.xml just an arbitrary xml file which means any
> >>>> well-formed
> >>>> xml file should work? How do I get my own xml dictionary files to work
> >>>> without transforming them into the xml format in your testDict.xml
> >>>> file?
> >>>> Is
> >>>> there documentation for this so that I can understand it on my own
> >>>> without
> >>>> bugging the entire list?Thanks!
> >>>> Ahmed
> >>>>
> >>>> On Tue, Jun 17, 2008 at 8:05 PM, Michael Tanenblatt <
> >>>> [EMAIL PROTECTED]>
> >>>> wrote:
> >>>>
> >>>> As Thilo mentioned in an email from May 19, 2008, I forgot to
> >>>> include the
> >>>>> source for uima.tt.TokenAnnotation, but otherwise the code should be
> >>>>> fine.
> >>>>>
> >>>>> Additionally, the problem you are seeing is with OffsetTokenizer,
> >>>>> which
> >>>>> is
> >>>>> just a sample tokenizer--if you have another, more robust
> >>>>> tokenizer, you
> >>>>> don't need this OffsetTokenizer.
> >>>>>
> >>>>>
> >>>>>
> >>>
> >
>
>

Re: [jira] Commented: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

Reply via email to