Re: New CAS heap impl?

Eddie Epstein Mon, 22 Oct 2007 15:52:11 -0700

>
> > With the current design, the top of the FS heap position on calling
> process
> > is used to identify new versus preexisting FS during or after the call:
> just
> > compare any FS address to that position to know if it is new or not.
>
> I can copy this behavior in the new implementation, but
> do we really want to rely on this and make it part of the
> design of the CAS and its heap?  Currently, this is a property
> of the implementation, but not something I ever considered
> to be part of the external contract of the CAS implementation.
>
> It only works because the heap doesn't do any garbage collection,
> and consequently no heap compaction.  It's not like that because
> I thought that was a particularly good idea, but simply because
> it would have been difficult to implement.  So it's a restriction
> of the implementation, and not something to be necessarily
> preserve in the future.



Copying the behavior would be appropriate, unless there is some other way to
easily distinguish pre-existing FS.


>
> > Consider the following code:
> >         AnalysisEngine ae = UIMAFramework.produceAnalysisEngine
> (specifier);
> >         CAS cas = ae.newCAS();
> >         cas.setDocumentText("some text");
> >         AnnotationFS fs = cas.createAnnotation(cas.getAnnotationType(),
> 0,
> > 4);
> >         ae.process(cas);
> >         System.out.println(fs.getCoveredText());
> >
> > Preexisting fs in the client must be valid after a process call, no?
>
> No.  I've been over this with Adam on one of the OASIS calls, too.
> It happens to work in the current implementation, but nowhere do
> we guarantee this or suggest that this should work.  To the contrary,
> we always tell people not to keep FS references across process calls.
> The design I am planning on may break this code.  I will guarantee
> that int IDs of FSs are constant for serialization/deserialization,
> but I won't necessarily keep the objects around.  So if the CAS was
> sent over the wire, the object may no longer be valid.  If the
> deployment is all local, it will continue to work (unless the FS
> has been deleted by one of the annotators).


Changing behavior for remote versus colocated annotators is not a good idea.
As for telling people not to keep application references, the only
documentation we have for that [that I have seen] has to do with code inside
an annotator process method. Specifically:

The JCas will be cleared between calls to your annotator's process() method.
All of the
analysis results related to the previous document will be deleted to make
way for analysis
of a new document. Therefore, you should never save a reference to a JCas
Feature
Structure object (i.e. an instance of a class created using JCasGen) and
attempt to reuse it
in a future invocation of the process() method. If you do so, the results
will be undefined.

Given no warning against doing this from an application, the fact that it
works and that it is fairly intuitive to do so means that there are likely
existing UIMA applications doing it. Of course we all are willing to break
existing user code when it gets in the way of some neat improvement :)

It was the second paragraph that I didn't understand.
>

Blob serialization, like the binary serialization used between C++ and Java,
leaves the Java Cas with a string heap rather than a string list. It would
be easy to change blob deserialization to recreate a string list instead,
and measure the performance difference.

Eddie

Re: New CAS heap impl?

Reply via email to