Re: New CAS heap impl?

Thilo Goetz Tue, 23 Oct 2007 01:58:10 -0700

Eddie Epstein wrote:
[...]
>>> With the current design, the top of the FS heap position on calling
>> process
>>> is used to identify new versus preexisting FS during or after the call:
>> just
>>> compare any FS address to that position to know if it is new or not.
>> I can copy this behavior in the new implementation, but
>> do we really want to rely on this and make it part of the
>> design of the CAS and its heap?  Currently, this is a property
>> of the implementation, but not something I ever considered
>> to be part of the external contract of the CAS implementation.
>>
>> It only works because the heap doesn't do any garbage collection,
>> and consequently no heap compaction.  It's not like that because
>> I thought that was a particularly good idea, but simply because
>> it would have been difficult to implement.  So it's a restriction
>> of the implementation, and not something to be necessarily
>> preserve in the future.
> 
> 
> Copying the behavior would be appropriate, unless there is some other way to
> easily distinguish pre-existing FS.


To my mind, the place to keep track of something like that
is the serialization code.  It has to iterate over the whole
CAS anyway and can do that kind of tracking.  It seems wrong
to put that kind of requirement on the heap implementation.

With the new kind of implementation I have in mind, this
information will still be available.  For future development,
it would be better not to rely on heap implementation details.

> 
> 
>>> Consider the following code:
>>>         AnalysisEngine ae = UIMAFramework.produceAnalysisEngine
>> (specifier);
>>>         CAS cas = ae.newCAS();
>>>         cas.setDocumentText("some text");
>>>         AnnotationFS fs = cas.createAnnotation(cas.getAnnotationType(),
>> 0,
>>> 4);
>>>         ae.process(cas);
>>>         System.out.println(fs.getCoveredText());
>>>
>>> Preexisting fs in the client must be valid after a process call, no?
>> No.  I've been over this with Adam on one of the OASIS calls, too.
>> It happens to work in the current implementation, but nowhere do
>> we guarantee this or suggest that this should work.  To the contrary,
>> we always tell people not to keep FS references across process calls.
>> The design I am planning on may break this code.  I will guarantee
>> that int IDs of FSs are constant for serialization/deserialization,
>> but I won't necessarily keep the objects around.  So if the CAS was
>> sent over the wire, the object may no longer be valid.  If the
>> deployment is all local, it will continue to work (unless the FS
>> has been deleted by one of the annotators).
> 
> 
> Changing behavior for remote versus colocated annotators is not a good idea.
> As for telling people not to keep application references, the only
> documentation we have for that [that I have seen] has to do with code inside
> an annotator process method. Specifically:
> 
> The JCas will be cleared between calls to your annotator's process() method.
> All of the
> analysis results related to the previous document will be deleted to make
> way for analysis
> of a new document. Therefore, you should never save a reference to a JCas
> Feature
> Structure object (i.e. an instance of a class created using JCasGen) and
> attempt to reuse it
> in a future invocation of the process() method. If you do so, the results
> will be undefined.
> 
> Given no warning against doing this from an application, the fact that it
> works and that it is fairly intuitive to do so means that there are likely
> existing UIMA applications doing it. Of course we all are willing to break
> existing user code when it gets in the way of some neat improvement :)

So you agree that maintaining this behavior is not a requirement?

> 
> It was the second paragraph that I didn't understand.
> 
> Blob serialization, like the binary serialization used between C++ and Java,
> leaves the Java Cas with a string heap rather than a string list. It would
> be easy to change blob deserialization to recreate a string list instead,
> and measure the performance difference.

I'll take your word for it, though I still don't see what this
has to do with what we were talking about.  In the new heap I'm
thinking about, there will be no such thing as a String heap or
list.  Strings will just be referenced directly from the objects
representing FSs.

--Thilo

> 
> Eddie
>

Re: New CAS heap impl?

Reply via email to