On 1/2/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:
<snip/> I'm not sure there's a contradiction between what I'm proposing, and what's in the spec proposal. When I run an Apache UIMA application, I make the decision what I want to see in my CAS. Any other application deserializing XMI files may do the same. In Apache UIMA, we can just call addToIndex() on each FS we deserialize.
Hmmm... well, that last sentence is a key new idea I didn't get. Maybe that could be OK, but it does seem to have some strange effects. Say in Apache UIMA I create an FS of a type that has an index defined for it, but I never call addToIndexes for that FS. The FS will not be in the index. Now say I create a reference to that FS so it's reachable from something that's indexed, then do a XMI serialization and deserialization. Now my FS will be in the index? Did I get that right? Also this seems to imply that we must allow the user to define some kind of "global" indexes where such FS will go - right now we have no such thing.
It's the user's decision what they want to see, and what they're not interested in. Indexes are a basic concept in Apache UIMA, and just like we require FSs to be indexed to be visible to the next annotator, we can expect FSs to be indexed to visible to (de)serialization. Surely all the spec requires is that all FSs _can_ be deserialized.
Yes, that's what I was concerned about -- FS that appear in the XMI but aren't in any view not being accessible. If we handle them by calling addToIndexes() on everything in the XMI this would not longer be a concern, but that raised other concerns as noted above.
>> > (2) If we decided to add some kind of global indexes to our >> > implementation (as was being recently discussed), that has no >> > representation in the XMI serialization. This seems like a problem to >> > me. How can we add things to our implementaiton that are supposed to >> > be persistent across CAS serialization without opening up a discussion >> > of what the serialization format looks like? >> >> I didn't look at the details of the XMI proposal because frankly, I'm >> not very interested in XML serialization. The conceptual part of the >> report does not contradict that approach, at least the way I read it. I >> probably missed something. Where does it say you can't have global >> indexes (or the OASIS equivalent thereof)? > > The OASIS spec proposal only defines views. In our implementation we > define indexes and say that we have an index repository per view and > that the members of the view are indexed in that index repository. If > a "global index" means an index containing objects that are in no > view, then this approach no longer works. What approach no longer works? The spec proposal or our implementation? The spec proposal says that a view is a set of FSs. It doesn't say (to my knowledge) that each FS must be contained in at least one view; nor does it say that it can be contained in only one view.
Sorry, miscommunication there - FS can be in more than one view. To me "global index" means something different than that the FS belongs to all views.
> I really think the XMI serialization is a key intersection point > between what we're doing in our CAS implementation and what OASIS is > charged with doing. Each group looks at the XMI serialization in a > different way - for us developers, we may just want to throw in new > attributes or whatever to make our latest, greatest implementation > idea work. The people in the OASIS group (at least some of them) look > at it as a realization of basic UIMA concepts. That may sometimes > become an annoyance for us, since we might like to shape the XMI > serialization however we want, but I think ultimately it's a good > thing for UIMA. Sure, but it leaves us with a lot of liberties. In the implementation, we can choose what kind of interfaces to the XMI form we offer to our users, and what extensions and conveniences we offer on top.
Yes, agreed.
Or, we could change the spec proposal.
Yes, I don't mean to rule that out. That would be done by proposing something over at OASIS, which I'm all for if we figure out what it is we want to change. -Adam
