I'm starting a vote on this on uima-dev mailing list. -Marshall
Marshall Schor wrote: > Michael Tanenblatt wrote: >> So, is there any actual interest in accepting this into the sandbox? >> Discussions died down with no resolution. >> >> ...m > Yes, please submit a Jira issue with an attachmentment and a checksum > for it. Then we'll call an official vote on the uima-dev list. > -Marshall >> >> >> On May 15, 2008, at 12:24 PM, Igor Sominsky wrote: >> >>> My group would like to offer the following UIMA component, Common >>> Feature Extractor (CFE), as an open source offering into the UIMA >>> sandbox, assuming there is interest from the community: >>> >>> >>> >>> CFE enables the configuration driven feature value extraction from >>> UIMA annotations contained in CAS. The extracted information can be >>> used for statistical analysis, performance metrics evaluation, >>> regression testing and machine learning related processing. >>> >>> >>> >>> CFE provides a flexible, yet powerful language FESL (Feature >>> Extraction Specification Language) for working with the UIMA CAS to >>> enable the collection and classification of resultant data. FESL is >>> a declarative XML-based language that expresses semantic rules for >>> the feature extraction. While the rules guide the feature extraction >>> in a completely generalized way and CFE provides methods for >>> subsequent processing to format the output of the extraction as >>> needed for downstream use. The destination for the output is >>> defined by a particular application where CFE is used (CAS, external >>> file, database, etc.). CFE could be implemented by either TAE or CAS >>> Consumer, depending on a particular application needs >>> >>> >>> >>> FESL rules allow flexible and powerful way of defining >>> multi-parameter criteria for specific information to be extracted >>> from CAS. Such criteria can be customized by: >>> >>> 1.. a type of an UIMA annotation object that contains the feature >>> of interest >>> 2.. a surrounding (enclosing) annotation type and a relative >>> location of the object within the enclosure that limits the >>> extraction within a boundaries of a certain UIMA type. >>> 3.. "path" to the feature from the annotation object >>> 4.. a type and value of the feature itself >>> 5.. values of any public Java get-style methods (methods that >>> accept no parameters and return a value) implemented by the >>> underlying class of the feature >>> 6.. a location of the object or the feature on a specific path (in >>> cases when it is required to select/bypass annotations if they are >>> features of other UIMA annotation types) >>> >>> >>> The feature values can be evaluated by conditional expressions >>> stated in FESL. Particularly, the feature values can be evaluated >>> whether they: >>> >>> 1.. are of a certain type >>> 2.. belong to a specific set of values (vocabulary) >>> 3.. belong to a range of numeric values (inclusively or >>> non-inclusively) >>> 4.. match certain bits of a bit mask (integer values only) >>> 5.. match a Java regular expression pattern, >>> >>> >>> These expressions can be specified in disjunctive normal form that >>> gives a powerful and flexible way of defining fairly complex >>> criteria for an extraction of a required annotation and/or its value >>> >>> >>> >>> The FESL itself is defined in XSD format and integrated with EMF for >>> syntax validation and automated code generation. >>> >>> >>> >>> CFE has been successfully used in several internal projects for >>> evaluation of performance metrics and machine learning. >>> >>> >>> >>> CFE is described in more detail in the paper "CFE - a system for >>> testing, evaluation and machine learning of UIMA based >>> applications", by I. Sominsky, A. Coden, M. Tanenblatt that will be >>> presented at UIMA for NLP workshop as part of the LREC 2008 >>> conference in Marrakech, Morocco. >>> >>> >>> >>> Igor Sominsky >>> >>> [EMAIL PROTECTED] >> >> >> > > >
