Re: proposal for a new testing and evaluation component

Igor Sominsky Thu, 15 May 2008 10:46:35 -0700

The vocabulary can be either fully embedded into the configuration file orreferenced by a URI. Any UIMA annotation feature or result of get-likemethod (getCoverdText for instance) could be evaluated whether it belongs tothe list, so it could be included to or excluded from extraction.

I am not sure if I understand your second question correctly, but let me tryto answer it. CFE implements the extraction process in 2 steps. On the firststep an annotation that represents a certain concept is located. It can be asingle word annotation (uima.tt.TokenAnnotation for instance) or a customtype annotation that contains the group of words in its properties (FSArrayfor instance). But in any case your concept must be represented by a singleannotation. On the second step, annotations that are in a certain context(defined by a configuration file) of you concept annotation are located. Forexample, the configuration file could specify to extract features from 5annotations to the left from an annotation that represents the concept(let's say a particular word). The annotations that are located on thesecond step - are the annotations the features are extracted from. I hope Igot your question right


Igor

----- Original Message -----From: "Thilo Goetz" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Thursday, May 15, 2008 1:07 PM
Subject: Re: proposal for a new testing and evaluation component

Cool, we absolutely need this!  I was actually about to
write something like this myself, but now I think I can
wait a little longer :-)

I have quite a few questions on this, here are just some
of them:

Can you integrate external resources in the process?  For
example, I might have a list of last names, and a feature
might be if a token occurs in that list or not.

I'd like to apply this to learning for individual words
or word windows.  Is that possible with/supported by your
tool?

--Thilo

Igor Sominsky wrote:
My group would like to offer the following UIMA component, Common FeatureExtractor (CFE), as an open source offering into the UIMA sandbox,assuming there is interest from the community:
CFE enables the configuration driven feature value extraction from UIMAannotations contained in CAS. The extracted information can be used forstatistical analysis, performance metrics evaluation, regression testingand machine learning related processing. CFE provides a flexible, yetpowerful language FESL (Feature Extraction Specification Language) forworking with the UIMA CAS to enable the collection and classification ofresultant data. FESL is a declarative XML-based language that expressessemantic rules for the feature extraction. While the rules guide thefeature extraction in a completely generalized way and CFE providesmethods for subsequent processing to format the output of the extractionas needed for downstream use. The destination for the output is definedby a particular application where CFE is used (CAS, external file,database, etc.). CFE could be implemented by either TAE or CAS Consumer,depending on a particular application needs
FESL rules allow flexible and powerful way of defining multi-parametercriteria for specific information to be extracted from CAS. Such criteriacan be customized by:
1.. a type of an UIMA annotation object that contains the feature ofinterest2.. a surrounding (enclosing) annotation type and a relative locationof the object within the enclosure that limits the extraction within aboundaries of a certain UIMA type.
  3.. "path" to the feature from the annotation object
  4.. a type and value of the feature itself
5.. values of any public Java get-style methods (methods that accept noparameters and return a value) implemented by the underlying class of thefeature6.. a location of the object or the feature on a specific path (incases when it is required to select/bypass annotations if they arefeatures of other UIMA annotation types)The feature values can be evaluated by conditional expressions stated inFESL. Particularly, the feature values can be evaluated whether they:
  1.. are of a certain type
  2.. belong to a specific set of values (vocabulary)
3.. belong to a range of numeric values (inclusively ornon-inclusively)
  4.. match certain bits of a bit mask (integer values only)
5.. match a Java regular expression pattern, These expressions can bespecified in disjunctive normal form that gives a powerful and flexibleway of defining fairly complex criteria for an extraction of a requiredannotation and/or its value
The FESL itself is defined in XSD format and integrated with EMF forsyntax validation and automated code generation. CFE has beensuccessfully used in several internal projects for evaluation ofperformance metrics and machine learning.
CFE is described in more detail in the paper "CFE - a system fortesting, evaluation and machine learning of UIMA based applications", byI. Sominsky, A. Coden, M. Tanenblatt that will be presented at UIMA forNLP workshop as part of the LREC 2008 conference in Marrakech, Morocco.Igor Sominsky
[EMAIL PROTECTED]

Re: proposal for a new testing and evaluation component

Reply via email to