Re: CAS and CasView redesign - question if all views should share thesame indexes?

Thilo Goetz Fri, 22 Dec 2006 00:19:06 -0800

Marshall Schor wrote:

In this discussion, I think some confusion arises from the use of"index" to mean both the index definition, andan instance (perhaps associated with a particular view) of that indexdefinition.
Also, in this discussion, the term CAS seems sometimes to be specific towhat we might call the base-view, versus
other specific "views" of the CAS.
If we more clearly distinguish these, the conversation may be easier tofollow. I've tried to distinguish them below:

Ever since I've grasped the major concepts, I think we've beencommunicating quite well ;-)


<snip>

Logically, the part of the Index Repository which has the definitions isnot duplicated;only the actual index instances are. We think there is a way to makethe actualcreation of the index instances "lazy" - in the sense that forperformance / overheadreasons, they are not created until the first attempt to add a FS tothat index instance
in that particular view.

I assume you are talking about the current implementation. I'm not surewhat this laziness would buy us. An empty index consumes virtually nospace. Unless we have reason to believe that there are significantgains to be had, I would vote for simplicity and against optimization.


<snip>

I didn't mean to suggest to have duplicate indexes. What I meant tosay was, each view should have its own annotation index.
In fact, today, each view has its own complete set of index instances,one per each index definition.

And that is not a good thing. Global indexes should be shared. That isalso the spirit of the OASIS draft, I think. The draft spec doesn'ttalk about indexes, true, but it certainly has been informed by ourimplementation.

In the CAS, each of these annotation indexes can be accessedseparately. In fact, I think this is pretty much what you're sayingas well. I don't see a use case for a global merged annotation index,other than tooling and utilities. And even for tooling, I think itmakes sense to access the annotation for each view separately. If weneed to iterate over annotations from different views sorted by theiroffsets, irrespective of the sofa they point into, we can provide autility function that does that on the fly.
Note however that this implies that one should never doaddFsToIndexes() on the CAS with an annotation, as it would be addedto all annotation indexes.
I think this means not to do an "addFsToIndexes() with an Annotation onthe Cas View which is the "base CAS". The current design woulddisallow this because an Annotation (which has a reference to a Sofa) isonly allowed to be
added to index instances that belong to the view which has that Sofa.

One of the things Adam and I had agreed on (Adam correct me if I'mwrong) was that the base CAS, as you call it, is *not* a view. WhatAdam was proposing for backward compatibility was a notion of a "currentview", which is directly accessible through the CAS APIs. However,those are just convenience/compatibility APIs. Conceptually, thecurrent view is a view of its own and could (should for new code) beaccessed through regular view APIs.

Now what you say about sofas is interesting. Currently, an index knowsnothing of views or sofas. The only thing that is checked when adding aFS to an index is the FS's type. Are you suggesting that there shouldbe special code that prevents me from adding an annotation that Icreated in one view to the index repository of another view?

That might be desirable, but it will get complicated and expensive. Ithink we need to document this point carefully and hope that usersunderstand that they shouldn't be doing this. It would be very hard toprevent all misuses of sofas/views.

My suggestion implies that the index repository itself is agnostic ofviews and sofas. If you add an annotation to the wrong repository,it's your own fault.
So to summarize, I would suggest that annotation indexes, for example,only live in views, there is no global annotation index (neitherconceptually, nor physically). To access annotations from the CAS,you still need to access view-specific indexes.
Non-sofa indexes, on the other hand, only exist in the global namespace.
I'm not sure what this means. If it means an index over somenon-Annotation type is not allowed to be part of a view, this seems togo against the idea of allowing "views" to hold subsets ofFeatureStructures. So I don't think that's a good idea here.

What I mean is that there is no way for a global index not to be part ofa view. Or without the double negation: every global index is part ofevery view.

Why not make this simpler by having a uniform approach: each view hasits own set of index instances (drawn from perhaps a global set of indexdefinitions, or perhaps some localized set of index definitions - thatpart to be worked out), whether or not the index is over Annotations ornot.

From a CAS implementation perspective, that's easy to do. I have noproblems with a view being an arbitrary subset of the set of all indexinstances.

What I don't see, and maybe it's just because I don't understand thespecification side of things, is how we're going to describe this in ourspecifiers.

A really simple approach would be to say that there are view-local indexdefinitions, and CAS-global index definitions. For the view-local ones,each view would have its own instance (and every view would have one).For the CAS-global ones, there would be one instance in the CAS, sharedby all views. However, that is just my current naive view of things.Much more complicated schemes could be envisioned.

The only rule of visibility is that one view can not access theview-specific indexes of another view. Everything else is alwaysvisible.
I didn't follow this...

See above. All CAS-global indexes are visible from/belong to everyview, view-local indexes have one instance per view, and no globalinstance. That's what I meant, but as I said, much more sophisticatedschemes could be imagined.


--Thilo

Re: CAS and CasView redesign - question if all views should share thesame indexes?

Reply via email to