Michael notes an even more serious problem. When you later try to
retrieve all FSs of type t2, it is arbitrary what index is used: the one
created for t2, or the one for t1. If the one for t1 is used, not all
FSs of type t2 will be returned, as at least one was added to the t2
index before the t1 index was created.
Thilo Goetz wrote:
I guess I didn't really pay proper attention when I reviewed this
earlier. While going over the CAS documentation, a couple of (related)
issues became apparent that I hadn't noticed before, and that we should
fix before we release. Both issues have to do with inheritance.
Suppose you have two types, t1 and t2, where t2 inherits from t1.
Suppose further that no indexes are defined for t1 and t2. Now some
application tries to index FS2 of type t2, a new bag index is created
for t2. Next, the application wants to index FS1 of type t1. No index
exists for t1, so it is created. By the way indexes work, the one for
t1 is also defined for t2. There are now two generic bag indexes for
t2, using up memory and CPU.
The same issue exists when an index for t2 has been defined by the user.
For annotators designed for high performance, creating many objects,
this can have a severe performance impact.
I would like to propose the following solution, though I'm not sure how
difficult it will be to implement. We create a special kind of index
that does not follow the usual type inheritance scheme (i.e., if it's
defined for t1, it's not necessarily defined for t2). We create this
index for all types without an index the first time somebody tries to
index an FS with no index defined. The global index will have a slight
performance impact for all indexing operations, but I hope it will be
slight, and certainly less problematic than what we have now.
Opinions?
--Thilo