I guess I didn't really pay proper attention when I reviewed this earlier. While going over the CAS documentation, a couple of (related) issues became apparent that I hadn't noticed before, and that we should fix before we release. Both issues have to do with inheritance.

Suppose you have two types, t1 and t2, where t2 inherits from t1. Suppose further that no indexes are defined for t1 and t2. Now some application tries to index FS2 of type t2, a new bag index is created for t2. Next, the application wants to index FS1 of type t1. No index exists for t1, so it is created. By the way indexes work, the one for t1 is also defined for t2. There are now two generic bag indexes for t2, using up memory and CPU.

The same issue exists when an index for t2 has been defined by the user. For annotators designed for high performance, creating many objects, this can have a severe performance impact.

I would like to propose the following solution, though I'm not sure how difficult it will be to implement. We create a special kind of index that does not follow the usual type inheritance scheme (i.e., if it's defined for t1, it's not necessarily defined for t2). We create this index for all types without an index the first time somebody tries to index an FS with no index defined. The global index will have a slight performance impact for all indexing operations, but I hope it will be slight, and certainly less problematic than what we have now.

Opinions?

--Thilo

Reply via email to