Marshall Schor wrote:
In this discussion, I think some confusion arises from the use of
"index" to mean both the index definition, and
an instance (perhaps associated with a particular view) of that index
definition.
Also, in this discussion, the term CAS seems sometimes to be specific to
what we might call the base-view, versus
other specific "views" of the CAS.
If we more clearly distinguish these, the conversation may be easier to
follow. I've tried to distinguish them below:
Ever since I've grasped the major concepts, I think we've been
communicating quite well ;-)
<snip>
Logically, the part of the Index Repository which has the definitions is
not duplicated;
only the actual index instances are. We think there is a way to make
the actual
creation of the index instances "lazy" - in the sense that for
performance / overhead
reasons, they are not created until the first attempt to add a FS to
that index instance
in that particular view.
I assume you are talking about the current implementation. I'm not sure
what this laziness would buy us. An empty index consumes virtually no
space. Unless we have reason to believe that there are significant
gains to be had, I would vote for simplicity and against optimization.
<snip>
I didn't mean to suggest to have duplicate indexes. What I meant to
say was, each view should have its own annotation index.
In fact, today, each view has its own complete set of index instances,
one per each index definition.
And that is not a good thing. Global indexes should be shared. That is
also the spirit of the OASIS draft, I think. The draft spec doesn't
talk about indexes, true, but it certainly has been informed by our
implementation.
In the CAS, each of these annotation indexes can be accessed
separately. In fact, I think this is pretty much what you're saying
as well. I don't see a use case for a global merged annotation index,
other than tooling and utilities. And even for tooling, I think it
makes sense to access the annotation for each view separately. If we
need to iterate over annotations from different views sorted by their
offsets, irrespective of the sofa they point into, we can provide a
utility function that does that on the fly.
Note however that this implies that one should never do
addFsToIndexes() on the CAS with an annotation, as it would be added
to all annotation indexes.
I think this means not to do an "addFsToIndexes() with an Annotation on
the Cas View which is the "base CAS". The current design would
disallow this because an Annotation (which has a reference to a Sofa) is
only allowed to be
added to index instances that belong to the view which has that Sofa.
One of the things Adam and I had agreed on (Adam correct me if I'm
wrong) was that the base CAS, as you call it, is *not* a view. What
Adam was proposing for backward compatibility was a notion of a "current
view", which is directly accessible through the CAS APIs. However,
those are just convenience/compatibility APIs. Conceptually, the
current view is a view of its own and could (should for new code) be
accessed through regular view APIs.
Now what you say about sofas is interesting. Currently, an index knows
nothing of views or sofas. The only thing that is checked when adding a
FS to an index is the FS's type. Are you suggesting that there should
be special code that prevents me from adding an annotation that I
created in one view to the index repository of another view?
That might be desirable, but it will get complicated and expensive. I
think we need to document this point carefully and hope that users
understand that they shouldn't be doing this. It would be very hard to
prevent all misuses of sofas/views.
My suggestion implies that the index repository itself is agnostic of
views and sofas. If you add an annotation to the wrong repository,
it's your own fault.
So to summarize, I would suggest that annotation indexes, for example,
only live in views, there is no global annotation index (neither
conceptually, nor physically). To access annotations from the CAS,
you still need to access view-specific indexes.
Non-sofa indexes, on the other hand, only exist in the global namespace.
I'm not sure what this means. If it means an index over some
non-Annotation type is not allowed to be part of a view, this seems to
go against the idea of allowing "views" to hold subsets of
FeatureStructures. So I don't think that's a good idea here.
What I mean is that there is no way for a global index not to be part of
a view. Or without the double negation: every global index is part of
every view.
Why not make this simpler by having a uniform approach: each view has
its own set of index instances (drawn from perhaps a global set of index
definitions, or perhaps some localized set of index definitions - that
part to be worked out), whether or not the index is over Annotations or
not.
From a CAS implementation perspective, that's easy to do. I have no
problems with a view being an arbitrary subset of the set of all index
instances.
What I don't see, and maybe it's just because I don't understand the
specification side of things, is how we're going to describe this in our
specifiers.
A really simple approach would be to say that there are view-local index
definitions, and CAS-global index definitions. For the view-local ones,
each view would have its own instance (and every view would have one).
For the CAS-global ones, there would be one instance in the CAS, shared
by all views. However, that is just my current naive view of things.
Much more complicated schemes could be envisioned.
The only rule of visibility is that one view can not access the
view-specific indexes of another view. Everything else is always
visible.
I didn't follow this...
See above. All CAS-global indexes are visible from/belong to every
view, view-local indexes have one instance per view, and no global
instance. That's what I meant, but as I said, much more sophisticated
schemes could be imagined.
--Thilo