I like the idea of having the FS creation be off of the base CAS, and
the indexing be with respect to a CasView.
Adam Lally wrote:
There hasn't been any activity on this mail thread ("Always pass base
CAS to process method") in a couple of days so I thought I would
restart discussion with a summary of the conclusions we reached.
Please comment.

Basic Ideas:
*  The CAS is the container for all of the analysis data (as per the
UIMA spec).  It should be possible to create FS directly on the CAS
and there should be some reasonable way to access all FS in the CAS.
This seems reasonable - especially for "simpler" CAS Processors.


* A CasView is a way of accessing a subset of data in the CAS.  To
accomplish this a CasView has its own index repository.  A CasView may
also have a Sofa -- if it does this means that annotations in its
index repository must refer to that Sofa.
This is good, if we're tying Sofas to CasViews. But I think this is not a necessary tie.
(e.g. you could have multiple sofas associated with one CasView, or
multiple CasViews associated with one Sofa).

Proposed changes to accomplish this:
* Add indexes to the base CAS:
  * Index definitions are shared across the entire CAS.
I added a discussion note about this, separately.
  * Each defined index will have one instance in the CAS as well as an
  instance for each view (or sofa?  right now sofas and views are 1-1 so
  it doesn't matter but I wonder what the right terminology is)
  * You can add FS to the indexes in a view (or multiple views).  You
  can also add FS to the indexes on the CAS, which is a place to store
  indexed FS that don't belong to any view.
The only use case for this that I remember was "globally" indexed
data, meaning that data is shared among a set of annotators.  But
there's a problem with this - once you put that set of annotators together
with another set, you run the risk of collisions among "shared" items. Sharing among a set is better served by having a specially named
view.  Using a global one may expose one to future problems when combining
independently written parts.

  * If you get an iterator over an index from the CAS, this iterator
  will return you FS that were indexed in the CAS well as FS that were
  indexed in any view.
An interesting design choice. What are the use-cases for making it work this way? What do
we give up when we choose this way, versus just returning those FSs that
were specifically indexed in the base view?

* Change CAS.getView(...) APIs to return the new type CasView.
CasView will have Sofa access methods and indexing methods but not
FS-creation methods.  (Except, maybe createAnnotation methods - see
next point.)
Good point - drives home the idea that the FS creation is done always in the base CAS.

* As I think about this now, it occurs to me that we should have a
method CAS.createAnnotation(int begin, int end, SofaFS sofa) to allow
annotations to be created off the CAS (consistent with the idea that
all analysis data can be created and accessed from the CAS).  But we
might also want CasView.createAnnotation(int begin, int end) as a
convenience.
Both createAnnotation methods are off of instances of CAS or CasView  -
they're not static, right?  In other words:
aCas.createAnnotation(...) or
aCasView.createAnnotation(...)?

Might want to treat this convenience function in the context of backwards
compatibility.

None of this addresses backwards compatibility, but I think that is
important.  I can live with forcing manual porting of multi-Sofa
annotators.  I don't think I can live with forcing manual porting of
single-Sofa annotators.  I'll start another mail thread to discuss
backwards compatibility ideas for single-Sofa annotators.

-Adam



Reply via email to