>
> >
> > Doing this in the serialization code will not work. There is no way for
> this
> > to efficiently detect which existing FS have had feature values changed.
> > More importantly, it eliminates the ability to track CAS changes for
> > colocated annotators, something that has been repeated asked for to
> improve
> > debugging and to track provenance.
>
> Now wait a minute.  The current heap implementation can't
> do that either.  All we were talking about was to know which
> FSs were *added* since the CAS was serialized.  That is
> something you can do now by remembering the top heap position,
> and I am planning to support this with the new heap impl as
> well.  Knowing what FSs were *modified* is an entirely different
> proposition.


Right, recording the fact that old FS have been modified will require
changes. The ability to recognize old FS quickly is key, thanks. I was
mainly commenting that serialization was not a good place to do this stuff.


>
> >> Given no warning against doing this from an application, the fact that
> it
> >>> works and that it is fairly intuitive to do so means that there are
> >> likely
> >>> existing UIMA applications doing it. Of course we all are willing to
> >> break
> >>> existing user code when it gets in the way of some neat improvement :)
> >> So you agree that maintaining this behavior is not a requirement?
> >
> >
> > No, not without further discussion.
>
> Maybe we should call for a vote?


Sure. What exactly are voting for, breaking this just for remote annotators,
or for all annotators?


>
> >> Blob serialization, like the binary serialization used between C++ and
> >> Java,
> >>> leaves the Java Cas with a string heap rather than a string list. It
> >> would
> >>> be easy to change blob deserialization to recreate a string list
> >> instead,
> >>> and measure the performance difference.
> >> I'll take your word for it, though I still don't see what this
> >> has to do with what we were talking about.  In the new heap I'm
> >> thinking about, there will be no such thing as a String heap or
> >> list.  Strings will just be referenced directly from the objects
> >> representing FSs.
> >>
> >
> > It sounds like you have no concern for binary serialization performance.
>
> I don't know what makes you say that.  That is not the
> impression I wanted to give, at least ;-)  I'll admit
> it's not my primary concern.  To repeat: I simply do not
> understand what you mean to show by your string heap vs.
> string list test.  I'm not unwilling, just intellectually
> incapable.


My concern is that deserializing FS into a single int array is much faster
than creating individual Java objects for each FS; same for strings, so
doing a simple experiment with strings would be relevant. Maybe I am
completely confused?


> Changing the heap design to enable garbage collection at the expense of
> > seriously degrading performance for existing users that are strongly
> > dependent on efficient CAS serialization does not sound viable.
>
> I agree completely.  If this turns out to seriously degrade
> performance for *any* important scenario, it's out.  However,
> I'm not sure it will degrade performance, not even for binary
> serialization.  Otherwise I wouldn't be suggesting this.
>

Oh good, my worries are over :)

Eddie

Reply via email to