Re: Question about Capabilities

Marshall Schor Thu, 29 May 2008 09:40:19 -0700

LeHouillier, Frank D. wrote:

Sorry for not being clear.  I was unclear from the documentation as to
whether the intent of the capabilities section of the Analysis Engine
Descriptors was to provide external guarantees provided by the framework
on the input/outputs to the annotator code or to provide a means for the
annotator code to determine internally which FS types to view and/or
produce (via the ResultsSpecification).

Currently, the input/output capabilities are not used by the frameworkfor providing external guarantees - they are used primarily forconfiguring default ResultSpecifications and for the built-in CapabilityLanguage Flow (flow controller). Because the input/output specs areavailable metadata, application code and tooling can make use of these -and that's something that the DocumentAnalyzer does.Regarding the result specifications - an application can explicitlyprovide result specifications when calling the analysis engine processmethod; if that's not done, then the framework constructs a defaultResult Specification, from input/output capabilities. Seehttp://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting

for more details.

So take the case of somebody using a primitive Analysis Engine with the

Document Analyzer.

Well, the Document Analyzer is a special case :-) It looks at theoutput capabilities (any application can do this) and won't displaytypes that are present in the CAS it's given to display at the end, ifthose types are not specified to be "output". So, the appearance fromthe result viewer is that those types are not there, but the truth isthat they are in the CAS (if the Analysis Engine generated them andindexed them) but are not displayed.

If they leave a type out of the AEDescriptor
Capabilities section then the type is not serialized into the xmi file
and viewable by the Document Analyzer viewer, thus some "filtering" of

the output at least, is taking place.

The Document Analyzer implementation (think of this a a particularapplication implementation) takes an Analysis Engine Descriptor (aprimitive or an aggregate), runs it on an input or set of inputs,serializes the resulting CASes to a results directory,and then calls aviewer to display these, passing that viewer a special set of types todisplay which it explicitly constructs from the Output Capabilitiesspecification metadata of the Analysis Engine it was given to run..

I don't think that the serialization step does any filtering (please letme know if you have a test case showing that it does) other than notoutputing types which are not indexed and which cannot be reached fromtypes which are indexed. The viewer does have a filter. The DocumentAnalyzer code constructs a list of types to display from all the types,by filtering those with the set which are designated as "outputs". Thisis not a core framework filtering, it is rather just something thisparticular application (the Document Analyzer) decided to do. Thefilter includes all subtypes of types in the Output capabilities.

Now suppose they have a type
system with Annotation type of Vehicle with subtypes Car, Submarine,
etc. but they only want to see in the Document Analyzer what was
annotated as a Vehicle.  My first instinct was that they could leave out
the types for Car, Submarine, etc., and only include Vehicle as an
output in the capabilities section and all of the annotations would be
serialized not as Car, Submarine, but as Vehicle and thus, when they
looked at the xmi file through the document analyzer a nice Vehicle type
would be viewable.  This isn't the case, instead the person gets the
subtypes, highlighted separately.

Yes. I looked at the code for DocumentAnalyzer, and it explicitlyincludes subtypes. See the source for DocumentAnalyzer, line 1194.

  My guess is that the "filtering"
behavior is a result of an implementation of the Document Analyzer
rather than something enforced by the framework, but I wasn't sure.

Yes, that is correct.

-Marshall

Re: Question about Capabilities

Reply via email to