On 1/8/07, Marshall Schor <[EMAIL PROTECTED]> wrote:
Adam Lally wrote:
> Hmmm... I'm curious how it's possible to merge CAS and JCas,
> especially if we need to do it without sacrificing any performance.
> For example, JCas has somewhat expensive initialization where it tries
> to load classes for each type in the type system.  We don't currently
> pay that cost if JCas is never used.  But if JCas and CAS are merged,
> how do we know if the user needs to have these classes loaded?  That's
> only one particular issue.

I haven't worked out all the details, but this is how it might work.
Let's assume the framework might know, somehow, if JCas was being used by
a particular component in the chain of annotators running "locally" (not
remote), in Java.
Then the framework could do a getJCas() call before calling that
component's process method,
to load up the "generators".   Otherwise, it could just never do the
getJCas() call,
and you would never pay the overhead penalty of loading the JCas cover
classes.


Hmmm.. I thought in your proposal there wouldn't even be a getJCas()
call... since there's no more JCas.  What type does getJCas() return -
CAS?

    Of course, I haven't figured out how to see if a component is using
    the JCas...  That might
    take a bit of thought.   :-)  The hard thing to handle would be what
    the Java cover
    objects the generator(s) would produce - before JCas is set up ,
    these produce
    generic objects mostly, for feature structures.  After JCas is set
    up , these produce
    the JCas cover objects.


We already face this situation.  I don't think it matters much because
the objects aren't cached.  So if a non-JCas annotator creates them,
and then the CAS is passed to a JCas-annotator, when that
JCas-annotator access things it will get JCas cover objects.

    My guess is we'd have to have some new metadata indicating JCas was
    being used.
    This would move the metadata indication from where it is now
    (discoverable by
    looking at the argument type of the process method) to the component
    metadata
    (and making it explicit).


Given that we need this information somewhere, I think I may actually
prefer it in the code.  Right now it's nice and clear from the code
which interface style is being used, and it can't get out of sync.
The proposed change would have one interface "CAS", but two different
styles of accessing it, and users would have to put in their
descriptor what style they were using, or else their code would not
work.  This seems like asking for trouble.

>> 3) Add a new interface called "CasViewSelector".  This interface has
>> just the 2-3 methods that select a view.  This interface is passed to
>> process methods.
>>
<snip/>
> Actually, forgetting about the spec for a second, what do we say in
> our documentation is the thing that carries the analysis data between
> annotators?  If we still say that's called a "CAS", and that the thing
> that's serialized and sent between remote annotators is still a "CAS",
> then this just doesn't seem consistent with this suggested naming of
> interfaces.

What this proposal is doing is using the same word, CAS, for all of these
things.   CAS is what is passed.  CASes contain views, and you can get
one (or more) views.  Each view has things you can do - represented by
the CAS Interface.

Perhaps I read code too literally, but in the proposal an annotator's
process method looks like process(CasViewSelector).  To me this means
that a "CasViewSelector" is what is passed, not a "CAS".  In an ideal
world, these would be the same.

Of course sometimes we pass "JCas" and that doesn't seem as bad to me.
I guess the concept of giving annotators different interfaces to the
CAS seemed simpler and easier to explain, compared to the idea of
giving a CasViewSelector which allows you to access different CAS
objects (where each represents a view).  The latter is doubly-bad
because not only does it not use "CAS" to refer to what's passed to
the process method, it does use CAS to describe something *different*.

-Adam

Reply via email to