Adam Lally wrote:
On 5/10/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
While working on the class-loader switching code, we have revisited an
issue with the way JCas objects work with respect to views.

Currently, for each view, there is a separate set of xxx_Type objects, a
separate set of "cached" cover objects (which are identical to other
view's objects, except that their _Type ref points to the instance for
this view).

This is there only (as far as we can see) to support
aJCasObject.addToIndexes() and removeFromIndexes() which uses this
information to pick the right "view" to use (remember that indexes are
held per view).

Besides inefficiency (replication of objects per view), there is another
side-effect.  JCas Objects can, themselves, be extended by users to hold
additional information, other than what's in the CAS.  The current
design would create new versions of these objects per view, so that
iterators over different views would get different instances.  So
information set into one JCas object in one view would not be "visible"
to instances obtained by iterating using a different view's index.  This
could be a documented "feature", or it could be a "bug".

Because current users seem to often use the aJCasObject.addToIndexes()
method, I want to retain that method, while getting the efficiencies and
fixing the "bug" (if we consider it a bug) above.  To do this, we could
make this work as before *for sofa-unaware annotators, only* as
follows:  Change the impl of addToIndexes and removeFromIndexes to
reference the "current-view".


It makes me very nervous to put in changes that intentionally break
compatibility with existing annotators.  One of UIMA's main goals is
to make it easier to integrate analytics into applications.  We don't
want application developers to be concerned that if they update their
UIMA version, suddently components that used to work will no longer
work.  Sometimes that means we're stuck with a suboptimal design for
something, but that's life.  If we *really* want to stop supporting
this method, deprecate it first and then wait a few years for the 3.0
release to come out, and think about doing it then. ;)

Right, but it depends on the likelyhood of breaking things - if we can get it close enough to 0, maybe ...

It seems the argument here is that there aren't very many multi-sofa
annotators, so breaking them (the ones that use JCas anyway) is not
that big a deal.  I'm just not sure how to judge that.

With that general comment out of the way, let me consider this
specific issue.  I think the code that would break is this:

JCas someView = baseJCas.getView(name);
MyAnnotation annot = new MyAnnotation(someView, begin, end);
annot.addToIndexes();


We could make this code work: We could change the impl of getView(name) to set the *current-view*. This isn't perfect. This breaks:

JCas someView_1 = baseJCas.getView(name1);
JCas someView_2 = baseJCas,getView(name2);

(new MyAnnotation(someView1)).addToIndexes();  // wrongly adds to view_2
(new MyAnnotation(someView2)).addToIndexes();  // adds to view_2

This could also be made to work; by setting the *current-view* when the new operator is executed. Again, it isn't perfect.. But maybe we approach 0 breakage in practice :-) (or maybe not - hard to tell...)

But wait.. for types derived from annotations, we already know what
view it should be indexed in.  Just follow the annotation's Sofa
reference and you will find the right view.  It's not valid to index
it in any other view, and in fact that results in an exception.

So instead of indexing in the "current view" (which might fail), for
annotations you could always index in the "correct view". :)  This
should not break any annotators.

Great idea - we should do perhaps in any case.
That leaves non-annotation types (and only those with custom indexes
defined for them matter).  For those we could:

(a) Decide we don't care about breaking this.  At this point the
number of affected annotators might be zero, but we can't be sure.  I
still don't like it on general principles.

(b) Optimize what we can get away with.  Only create extra _Type
objects for non-annotation types for which custom indexes are defined.
That has the downside of probably making the JCas code ugly and more
prone to bugs.

Next thought... what if instead of the separate _Type objects, you
maintained in the JCasImpl a map from JCas object to "home view".  As
above you only need to do this for non-annotation objects that have
indexes defined for them (AND if the home view is not the initial
view, since that could be the default).

That has different performance characteristics which may or may not
necessarily be better (it depends on how many instances get created I
think), but maybe it is a cleaner way to stay compatible.
Not sure I follow this... Let's see - you would have one generator set, which iterators would use to generate a cover instance, for all the different views, . If a cover instance already existed, you would need to examine this map-to-home-view, and see if the home view was the right one. If not, you would generate an additional cover instance. The cache from key=CAS addr to value=cover-JCas-instance would need to map to a collection of cover-JCas-instances, possibly one per view in which it was indexed. Is this what you were thinking?

Other things to consider:
The (implicit) contract that co-located JCas things get the same instance, so if the user added Java fields, they would be visible / shared, breaks with the current design, but would be "fixed" with the change to having only one JCas cover instance (and using *current-view* as the implicit view to use with addToIndexes() etc.).

My feeling is with setting *current-view* on entry to user code being passed a CAS, and on anything that returns a CAS (like getView(...)), and fixing the subtypes of Annotation case, we cover enough of this space to make breaking unlikely enough to make this worth doing, to avoid breaking the (implicit) contract about use of JCas in co-located components. We can document the workaround when you start having annotators working with multiple views at the same time - to say users should specify which view they want things indexed in (or things removed from).

This change will potentially improve performance (space/time) also, for multi-view applications using JCas, and impacts the design of the Class Loader switching mechanism. Because we'd like to get that done this month, I'd like to hear more comments about this, so we can decide sooner rather than later which direction to go in...


Hope that helped,
 -Adam

Yes, it did, thanks! -Marshall

Reply via email to