Marshall Schor wrote:
In this discussion, I think some confusion arises from the use of "index" to mean both the index definition, and an instance (perhaps associated with a particular view) of that index definition.

Also, in this discussion, the term CAS seems sometimes to be specific to what we might call the base-view, versus
other specific "views" of the CAS.

If we more clearly distinguish these, the conversation may be easier to follow. I've tried to distinguish them below:

Ever since I've grasped the major concepts, I think we've been communicating quite well ;-)

<snip>
Logically, the part of the Index Repository which has the definitions is not duplicated; only the actual index instances are. We think there is a way to make the actual creation of the index instances "lazy" - in the sense that for performance / overhead reasons, they are not created until the first attempt to add a FS to that index instance
in that particular view.

I assume you are talking about the current implementation. I'm not sure what this laziness would buy us. An empty index consumes virtually no space. Unless we have reason to believe that there are significant gains to be had, I would vote for simplicity and against optimization.

<snip>
I didn't mean to suggest to have duplicate indexes. What I meant to say was, each view should have its own annotation index.
In fact, today, each view has its own complete set of index instances, one per each index definition.

And that is not a good thing. Global indexes should be shared. That is also the spirit of the OASIS draft, I think. The draft spec doesn't talk about indexes, true, but it certainly has been informed by our implementation.

In the CAS, each of these annotation indexes can be accessed separately. In fact, I think this is pretty much what you're saying as well. I don't see a use case for a global merged annotation index, other than tooling and utilities. And even for tooling, I think it makes sense to access the annotation for each view separately. If we need to iterate over annotations from different views sorted by their offsets, irrespective of the sofa they point into, we can provide a utility function that does that on the fly.

Note however that this implies that one should never do addFsToIndexes() on the CAS with an annotation, as it would be added to all annotation indexes.
I think this means not to do an "addFsToIndexes() with an Annotation on the Cas View which is the "base CAS". The current design would disallow this because an Annotation (which has a reference to a Sofa) is only allowed to be
added to index instances that belong to the view which has that Sofa.

One of the things Adam and I had agreed on (Adam correct me if I'm wrong) was that the base CAS, as you call it, is *not* a view. What Adam was proposing for backward compatibility was a notion of a "current view", which is directly accessible through the CAS APIs. However, those are just convenience/compatibility APIs. Conceptually, the current view is a view of its own and could (should for new code) be accessed through regular view APIs.

Now what you say about sofas is interesting. Currently, an index knows nothing of views or sofas. The only thing that is checked when adding a FS to an index is the FS's type. Are you suggesting that there should be special code that prevents me from adding an annotation that I created in one view to the index repository of another view?

That might be desirable, but it will get complicated and expensive. I think we need to document this point carefully and hope that users understand that they shouldn't be doing this. It would be very hard to prevent all misuses of sofas/views.

My suggestion implies that the index repository itself is agnostic of views and sofas. If you add an annotation to the wrong repository, it's your own fault.

So to summarize, I would suggest that annotation indexes, for example, only live in views, there is no global annotation index (neither conceptually, nor physically). To access annotations from the CAS, you still need to access view-specific indexes.

Non-sofa indexes, on the other hand, only exist in the global namespace.
I'm not sure what this means. If it means an index over some non-Annotation type is not allowed to be part of a view, this seems to go against the idea of allowing "views" to hold subsets of FeatureStructures. So I don't think that's a good idea here.

What I mean is that there is no way for a global index not to be part of a view. Or without the double negation: every global index is part of every view.

Why not make this simpler by having a uniform approach: each view has its own set of index instances (drawn from perhaps a global set of index definitions, or perhaps some localized set of index definitions - that part to be worked out), whether or not the index is over Annotations or not.

From a CAS implementation perspective, that's easy to do. I have no problems with a view being an arbitrary subset of the set of all index instances.

What I don't see, and maybe it's just because I don't understand the specification side of things, is how we're going to describe this in our specifiers.

A really simple approach would be to say that there are view-local index definitions, and CAS-global index definitions. For the view-local ones, each view would have its own instance (and every view would have one). For the CAS-global ones, there would be one instance in the CAS, shared by all views. However, that is just my current naive view of things. Much more complicated schemes could be envisioned.

The only rule of visibility is that one view can not access the view-specific indexes of another view. Everything else is always visible.
I didn't follow this...

See above. All CAS-global indexes are visible from/belong to every view, view-local indexes have one instance per view, and no global instance. That's what I meant, but as I said, much more sophisticated schemes could be imagined.

--Thilo

Reply via email to