On Tue, Jul 8, 2008 at 6:08 PM, Marshall Schor <[EMAIL PROTECTED]> wrote: >> The CAS heap and >> in particular the way it grows is a major performance bottleneck >> for large documents. If we have other parts of UIMA depend on >> the (bad) implementation details now, we'll never be able to >> improve on the design. > > Hmmm, I guess I was thinking that if we wanted to change this in the future, > we could. I agree it would be more difficult; we've changed things like > this before using the refactoring tools that let you see pretty clearly > various dependencies... >
I think what's happening here is that XmiCasSerializer has a requirement on the CAS that it can efficiently know which FS are new (created since some mark - when the CAS was received at a service in this case). Currently this is doable using the FS addresses, but if that were no longer the case, we could build a different marking mechanism that does the same thing. So I think I'm agreeing with Marshall there. That said, if checking an FS against a set of known FS's that were input to the service isn't a significant performance hit, then maybe that is a more flexible way to go, which would avoid having a problem with this in the future. (That set already exists - it is built by the deserializer and used to ensure consistent ids.) Hm - I seem to have added one more voice to the chorus of people recently requesting seeing some performance numbers before making a design decision. ;) -Adam
