Re: [jira] Commented: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

Michael Baessler Thu, 19 Jun 2008 02:19:35 -0700

Hi Ahmed,

here is the link of the discussion.


http://www.mail-archive.com/[EMAIL PROTECTED]/msg01277.html

There are some minor differences in the capabilities. I think the discussion 
will show you the
details and help you to make your decision which component do you need. For the 
DictionaryAnnotator
there is a official release with documentation available. You get it with the 
Annotator-Addons
package on the UIMA download page.

-- Michael

Michael Tanenblatt wrote:
> There is some in-depth discussion about this in the UIMA User mailing
> list--check the archives. The subject line was "Any interest in this as
> an open source project?", and it was from May 2008 or possibly started
> at the end of April.
> 
> 
> On Jun 18, 2008, at 12:33 PM, Ahmed Abdeen Hamed wrote:
> 
>> Thanks for the response. I am still not sure about some aspects of it. I
>> just found out that the UIMA framework has this following
>> DictionaryAnnotator feature:
>> http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/DictionaryAnnotator/doc/pdf/DictionaryAnnotatorUserGuide.pdf
>>
>>
>> This is similar to what the ConceptMapper doing. Is there any
>> advantage over
>> the DictionaryAnnotator?
>>
>> Thank you!
>> Ahmed
>>
>> On Wed, Jun 18, 2008 at 10:23 AM, Michael Tanenblatt <
>> [EMAIL PROTECTED]> wrote:
>>
>>> My original message regarding this talks some about the dictionary
>>> format.
>>> I am in the process o writing a paper describing the whole of
>>> ConceptMapper,
>>> but that is not yet done. Here is what I wrote before:
>>>
>>> The structure of the dictionary itself is quite flexible. Entries can
>>> have
>>>> any number of variants (synonyms), and arbitrary features can be
>>>> associated
>>>> with dictionary entries. Individual variants inherit features from
>>>> parent
>>>> token (i.e., the canonical from), but can override them or add
>>>> additional
>>>> features. In the following sample dictionary entry, there are 5
>>>> variants
>>>> of
>>>> the canonical form, and as described earlier, each inherits the
>>>> SemClass
>>>> and POS attributes from the canonical form, with the exception of the
>>>> variant "mesenteric fibromatosis (c48.1)", which overrides the value of
>>>> the
>>>> SemClass attribute (this is somewhat of a contrived example, just to
>>>> make
>>>> that point):
>>>> <token canonical="abdominal fibromatosis" SemClass="Diagnosis"
>>>> POS="NN">
>>>> <variant base="abdominal fibromatosis" />
>>>> <variant base="abdominal desmoid" />
>>>> <variant base="mesenteric fibromatosis (c48.1)"
>>>> SemClass="Diagnosis-Site" />
>>>> <variant base="mesenteric fibromatosis" />
>>>> <variant base="retroperitoneal fibromatosis" />
>>>> </token>
>>>>
>>>
>>> So, testDict.xml is just an example. Two key AE descriptor parameters
>>> are
>>> "AttributeList" and "FeatureList", which provide the means to map
>>> from the
>>> XML attributes to the target annotation features. If your target
>>> annotation
>>> were called "DictTerm" and the DictTerm had the features
>>> "canonicalForm",
>>> "semanticClass" and "partOfSpeechTag", using the example dictionary
>>> snippet
>>> shown above, you would set AttributeList to:
>>>
>>>       DictCanon
>>>       SemClass
>>>       POS
>>>
>>> and you would set FeatureList to:
>>>
>>>       canonicalForm
>>>       semanticClass
>>>       partOfSpeechTag
>>>
>>> then, when one of the variants is matched in the text, a new DictTerm
>>> would
>>> be created with its semanticClass set to the value of the SemClass
>>> attribute
>>> and its partOfSpeechTag set to the value of the POS attribute.
>>>
>>> One important point: matches are only performed against the strings
>>> listed
>>> as attributes to the "variant" tag's "base" attribute. It is common
>>> practice
>>> to have something like the "token" element with something like a
>>> canonical
>>> form that is the same as one of the variants, but that is not required.
>>>
>>> I hope this helps!
>>>
>>>
>>>
>>> On Jun 18, 2008, at 10:06 AM, Ahmed Abdeen Hamed wrote:
>>>
>>> Thank Michael! I only recently joined the list so I missed the early
>>>> posting. I like this example a lot. I was able to get it to run
>>>> using the
>>>> document analyzer from the uimaj-example. I have some questions though:
>>>> Is the testDict.xml just an arbitrary xml file which means any
>>>> well-formed
>>>> xml file should work? How do I get my own xml dictionary files to work
>>>> without transforming them into the xml format in your testDict.xml
>>>> file?
>>>> Is
>>>> there documentation for this so that I can understand it on my own
>>>> without
>>>> bugging the entire list?Thanks!
>>>> Ahmed
>>>>
>>>> On Tue, Jun 17, 2008 at 8:05 PM, Michael Tanenblatt <
>>>> [EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>> As Thilo mentioned in an email from May 19, 2008, I forgot to
>>>> include the
>>>>> source for uima.tt.TokenAnnotation, but otherwise the code should be
>>>>> fine.
>>>>>
>>>>> Additionally, the problem you are seeing is with OffsetTokenizer,
>>>>> which
>>>>> is
>>>>> just a sample tokenizer--if you have another, more robust
>>>>> tokenizer, you
>>>>> don't need this OffsetTokenizer.
>>>>>
>>>>>
>>>>>
>>>
>

Re: [jira] Commented: (UIMA-1033) ConceptMapper--a highly configurable, token-based dictionary lookup UIMA component

Reply via email to