So, is there any actual interest in accepting this into the sandbox?
Discussions died down with no resolution.
...m
On May 15, 2008, at 12:24 PM, Igor Sominsky wrote:
My group would like to offer the following UIMA component, Common
Feature Extractor (CFE), as an open source offering into the UIMA
sandbox, assuming there is interest from the community:
CFE enables the configuration driven feature value extraction from
UIMA annotations contained in CAS. The extracted information can be
used for statistical analysis, performance metrics evaluation,
regression testing and machine learning related processing.
CFE provides a flexible, yet powerful language FESL (Feature
Extraction Specification Language) for working with the UIMA CAS to
enable the collection and classification of resultant data. FESL is
a declarative XML-based language that expresses semantic rules for
the feature extraction. While the rules guide the feature extraction
in a completely generalized way and CFE provides methods for
subsequent processing to format the output of the extraction as
needed for downstream use. The destination for the output is
defined by a particular application where CFE is used (CAS, external
file, database, etc.). CFE could be implemented by either TAE or CAS
Consumer, depending on a particular application needs
FESL rules allow flexible and powerful way of defining multi-
parameter criteria for specific information to be extracted from
CAS. Such criteria can be customized by:
1.. a type of an UIMA annotation object that contains the feature
of interest
2.. a surrounding (enclosing) annotation type and a relative
location of the object within the enclosure that limits the
extraction within a boundaries of a certain UIMA type.
3.. "path" to the feature from the annotation object
4.. a type and value of the feature itself
5.. values of any public Java get-style methods (methods that
accept no parameters and return a value) implemented by the
underlying class of the feature
6.. a location of the object or the feature on a specific path (in
cases when it is required to select/bypass annotations if they are
features of other UIMA annotation types)
The feature values can be evaluated by conditional expressions
stated in FESL. Particularly, the feature values can be evaluated
whether they:
1.. are of a certain type
2.. belong to a specific set of values (vocabulary)
3.. belong to a range of numeric values (inclusively or non-
inclusively)
4.. match certain bits of a bit mask (integer values only)
5.. match a Java regular expression pattern,
These expressions can be specified in disjunctive normal form that
gives a powerful and flexible way of defining fairly complex
criteria for an extraction of a required annotation and/or its value
The FESL itself is defined in XSD format and integrated with EMF for
syntax validation and automated code generation.
CFE has been successfully used in several internal projects for
evaluation of performance metrics and machine learning.
CFE is described in more detail in the paper "CFE - a system for
testing, evaluation and machine learning of UIMA based
applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be
presented at UIMA for NLP workshop as part of the LREC 2008
conference in Marrakech, Morocco.
Igor Sominsky
[EMAIL PROTECTED]