A few quick comments here, then I'll deal with the big issues in another email.

On 12/22/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:
Marshall Schor wrote:
> In this discussion, I think some confusion arises from the use of
> "index" to mean both the index definition, and
> an instance (perhaps associated with a particular view) of that index
> definition.
>
> Also, in this discussion, the term CAS seems sometimes to be specific to
> what we might call the base-view, versus
> other specific "views" of the CAS.
>
> If we more clearly distinguish these, the conversation may be easier to
> follow.  I've tried to distinguish them below:

Ever since I've grasped the major concepts, I think we've been
communicating quite well ;-)


I agree, though being clearer on terminology can't hurt.  I've tried
to adhere to:

* "CAS" means the entire CAS.  It never means a specific view of the CAS.
* "Index Definition" means the declaration in the descriptor that
defines an index - giving it a label, kind of index, CAS type, and
sort keys.
* "Index" is an instance of an index definition - something that can
be retreived by a getIndex() call and from which you can get an
iterator.
* "Physical Index" is an actual data structure holding references to
FeatureStructures.  This  is transparent to the user but sometimes we
need to talk about it if we're concerned about performance.

One of the things Adam and I had agreed on (Adam correct me if I'm
wrong) was that the base CAS, as you call it, is *not* a view.

+1.  This is central to our API renaming - we are envisioning creating
two interfaces: CAS and CasView.  An instance of CAS refers to an
entire CAS, which may contain multiple CasViews.  It is not consistent
with that to say that "the CAS is a view" or that "a view is a CAS".

Now what you say about sofas is interesting.  Currently, an index knows
nothing of views or sofas.  The only thing that is checked when adding a
FS to an index is the FS's type.  Are you suggesting that there should
be special code that prevents me from adding an annotation that I
created in one view to the index repository of another view?


In fact I believe that code already exists and it's not that
complicated (in our current implementation anyway).  Each annotation
has a feature that is a reference to the Sofa, and the view has a
reference to its Sofa.  So I think this is just an integer comparison
between these two values.

This constaint is mentioned in the OASIS spec:  an "anchored view" is
a view that's tied to a Sofa, and it is a constraint that all
annotations that are members of an anchored view refer to that view's
Sofa.

A really simple approach would be to say that there are view-local index
definitions, and CAS-global index definitions.  For the view-local ones,
each view would have its own instance (and every view would have one).
For the CAS-global ones, there would be one instance in the CAS, shared
by all views.  However, that is just my current naive view of things.
Much more complicated schemes could be envisioned.


I'm not too worried about the specifiers.  A scheme like this would be
fine and fairly easy to add, if we first decide that this idea of
separate local/global index definitions is the way we want to go.


>> The only rule of visibility is that one view can not access the
>> view-specific indexes of another view.  Everything else is always
>> visible.
> I didn't follow this...

See above.  All CAS-global indexes are visible from/belong to every
view, view-local indexes have one instance per view, and no global
instance.  That's what I meant, but as I said, much more sophisticated
schemes could be imagined.


I think this is the key idea still to be nailed down, really.  Like
Marshall I don't think I completely understood what Thilo was
suggesting with the global indexes being visible from the views.  I
have a better understanding now but have some concerns.  This will be
the topic of my next email.


Marshall Schor wrote:
Re: Need for "Global indexes"
<snip>
What is the use case for the global view set of indexes? I can't recall
the use-case for this, beyond
being able to get all the data.   This thread has suggested other
utilities that can effectively
"merge" the results from other view's index instances. Are there other
use cases?

A hypothetical use case is that I want to get all Person mentions
(annotations) in the CAS, say because I'm going to populate a database
with their covered text and perhaps other feature values.

Of course, you could walk all views to do that.  But I'm suggesting
you shouldn't have to.  We could add a utility method to hide that
detail; I guess I'm OK with that.

Basicaly, this discussion is more about getting the concepts straight
than adding new functionality.  I'll say again:

(1) The CAS is the container for all of the analysis data (as per the
UIMA spec).  It must be possible to create FS directly on the CAS
and there must be some reasonable way to retrieve the FS in the CAS
without having to be concerened wtih views.

-Adam

Reply via email to