LeHouillier, Frank D. wrote:
Sorry for not being clear.  I was unclear from the documentation as to
whether the intent of the capabilities section of the Analysis Engine
Descriptors was to provide external guarantees provided by the framework
on the input/outputs to the annotator code or to provide a means for the
annotator code to determine internally which FS types to view and/or
produce (via the ResultsSpecification).
Currently, the input/output capabilities are not used by the framework for providing external guarantees - they are used primarily for configuring default ResultSpecifications and for the built-in Capability Language Flow (flow controller). Because the input/output specs are available metadata, application code and tooling can make use of these - and that's something that the DocumentAnalyzer does. Regarding the result specifications - an application can explicitly provide result specifications when calling the analysis engine process method; if that's not done, then the framework constructs a default Result Specification, from input/output capabilities. See http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting
for more details.
So take the case of somebody using a primitive Analysis Engine with the
Document Analyzer.
Well, the Document Analyzer is a special case :-) It looks at the output capabilities (any application can do this) and won't display types that are present in the CAS it's given to display at the end, if those types are not specified to be "output". So, the appearance from the result viewer is that those types are not there, but the truth is that they are in the CAS (if the Analysis Engine generated them and indexed them) but are not displayed.
If they leave a type out of the AEDescriptor
Capabilities section then the type is not serialized into the xmi file
and viewable by the Document Analyzer viewer, thus some "filtering" of
the output at least, is taking place.
The Document Analyzer implementation (think of this a a particular application implementation) takes an Analysis Engine Descriptor (a primitive or an aggregate), runs it on an input or set of inputs, serializes the resulting CASes to a results directory,and then calls a viewer to display these, passing that viewer a special set of types to display which it explicitly constructs from the Output Capabilities specification metadata of the Analysis Engine it was given to run..

I don't think that the serialization step does any filtering (please let me know if you have a test case showing that it does) other than not outputing types which are not indexed and which cannot be reached from types which are indexed. The viewer does have a filter. The Document Analyzer code constructs a list of types to display from all the types, by filtering those with the set which are designated as "outputs". This is not a core framework filtering, it is rather just something this particular application (the Document Analyzer) decided to do. The filter includes all subtypes of types in the Output capabilities.
Now suppose they have a type
system with Annotation type of Vehicle with subtypes Car, Submarine,
etc. but they only want to see in the Document Analyzer what was
annotated as a Vehicle.  My first instinct was that they could leave out
the types for Car, Submarine, etc., and only include Vehicle as an
output in the capabilities section and all of the annotations would be
serialized not as Car, Submarine, but as Vehicle and thus, when they
looked at the xmi file through the document analyzer a nice Vehicle type
would be viewable.  This isn't the case, instead the person gets the
subtypes, highlighted separately.
Yes. I looked at the code for DocumentAnalyzer, and it explicitly includes subtypes. See the source for DocumentAnalyzer, line 1194.
  My guess is that the "filtering"
behavior is a result of an implementation of the Document Analyzer
rather than something enforced by the framework, but I wasn't sure.
Yes, that is correct.
-Marshall

Reply via email to