Bart Mellebeek wrote: > Marshall Schor wrote: >> Bart Mellebeek wrote: >> >> >>>> Hello, >>>> >>>> I have a question on the exact role of the output types in the >>>> Capabilities of an AE descriptor that I couldn't find in the >>>> documentation. >>>> A strange thing happens when I try to manipulate the descriptors of >>>> ex4/ of the tutorial in uimaj-examples. I am running >>>> ex4/MeetingDetectorTAE.xml with UIMA Document Analyzer. When I delete >>>> the output type RoomNumber in the Capabilities of >>>> ex2/RoomNumberAnnotator.xml and I run ex4/MeetingDetectorTAE.xml, the >>>> RoomNumber type is still visible in the analysis results. >>>> >>> >> >> I think this is because ex4/MeetingDetectorTAE.xml itself declares it >> outputs the RoomNumber type. The DocumentAnalyzer is just a sample >> application >> that shows *selected* feature structure types - selected by looking >> at the >> output capabilities of the top-most analysis engine (in the case of >> an aggregate >> having "nested" components - such as you have in your example). This >> means that >> the DocumentAnalyzer may not be showing all the feature structures in >> the CAS, >> but that doesn't mean that those feature structures are not there. >> >> See the code in uimaj-tools project: in >> src/main/org/apache/uima/tools/docanalyzer/DocumentAnalyzer.java, >> lines 1185 - 1207. >> >> >>>> Likewise, when I delete the output types TimeAnnot and DateAnnot in >>>> the capabilities of ex3/TutorialDateTime.xml, these types are still >>>> visible in the analysis results. >>> >> >> I think for the same reason - the ex4/MeetingDetectorTAE.xml itself >> declares it outputs the the DateAnnot and TimeAnnot feature structures. >> >> >> >>>> Only deleting the output type DateTimeAnnot in the capabilities of >>>> ex3/TutorialDateTime.xml seems to have an impact on the analysis >>>> results. >>>> >>> >> I ran the DocAnalyzer without modifying the examples, and the >> DateTimeAnnot does *not* appear - this is the expected behavior >> because it is not listed in the DocumentAnalyzer's output >> capabilities. I think it will not appear, even if you don't delete >> the output type DateTimeAnnot in the capabilities of >> ex3/TutorialDateTime.xml. >> >>>> Why is it that deleting some output types have no impact on analysis >>>> results, while deleting other output types do have an impact? Aren't >>>> all output types supposed to have this impact? >>>> >>> >> The UIMA framework makes the UIMA Metadata available to applications, >> but doesn't specify what those application do with that data. The >> DocumentAnalyzer is just a sample application - built to show many of >> the capabilities of UIMA. It took a particular design choice - to >> show annotations in the CAS that were specified as output >> capabilities of the top-most component (in the case of aggregates). >> Hope that helps. >> >> -Marshall >> >> >>> Any help appreciated. >>> Thanks, >>> >>> Bart >>> >>> >> >> > Thanks for your input. > > I asked this question because I am trying to build a UIMA pipeline and > the role of the AE capabilities in the intermediate annotators is not > entirely clear to me. I was under the impression that for each > annotator in the pipeline, the capabilities specify which are its > input/output types. Yes, I think that is correct. > However, apparently once an annotation is inside the CAS, the > specifications in the capabilities of the AEs do not seem to be > relevant anymore. > > For example, take the aggregate ex4/MeetingDetectorTAE.xml. > MeetingAnnotator.java uses the types RoomNumber, DateAnnot and > TimeAnnot to detect meetings. What surprises me is that deleting the > output type RoomNumber in ex2/RoomNumberAnnotator.xml and deleting all > the input types in ex4/MeetingAnnotator.xml (RoomNumber, DateAnnot and > TimeAnnot) has no effect at all on the output: meetings are still > detected correctly although these types have been deleted from the > capabilities. The use of the capabilities varies within the framework - so there is not a simple answer. One thing that capabilities currently are *not* used for is deleting elements out of the CAS. - so that is why things still work in the case you cite.
Some things capabilities are used for include setting the default Result Specification (see http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting ) . Another one is the CapabilityLanguageFlow (search for capabilitylanguageflow in http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/references/references.html). > Is this just because these annotations are already readily available > in the CAS and if so, what exactly is the role of the capabilities and > when should their types/features be specified? The best practice is to use these pecifications to document, for each component, what inputs and outputs it needs / produces. In the future, UIMA may be enhanced to do more with these, or tooling may be developed that does more with this metadata (for instance, configuration tooling that insures a "flow" makes sense - that things needed are produced before they're needed, etc.). > > > Sorry if this is a basic question: I'm new to this. No problem. > Thank you for your time, You're welcome, and welcome to UIMA :-) -Marshall > > Bart > >
