Re: Getting annotations from CASes 'external' to a pipeline

Marshall Schor Thu, 15 Mar 2012 15:00:04 -0700


On 3/15/2012 4:38 PM, Eddie Epstein wrote:

Cannot deserialize into a CAS from getEmptyCas().

This is not right.  More information soon (ran out of time today). -Marshall

Must use a CAS from
CasCreationUtils.createCas for deserialization, and then use casCopier
to copy to the CAS from getEmptyCas().

Pick the version of createCas that specifies a typesystem, and use the
typesystem from the pipeline CAS (i.e. the one from getEmptyCas).

On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling<[email protected]>  wrote:

Thanks, guys.  This is getting me closer to the goal, and explains the
observed
behaviors.  Now I'm facing issues when implemented as a CAS Multiplier.  I
try
creating a new CAS first with getEmptyJCas().

Here are some various strategies and what resulted:

  * create a deserializer with the typesystem from the AE (which
        includes types in the 'external' CAS to be deserialized)
  * ues it to deserialize into the empty CAS created with getEmptyJCas()

  ->  The deserialized CAS for some reason has only the base TOP typesystem
  ->  Trying to access an annotation from an index (that should be there)
    generates the "used in Java code,  but was not declared in the XML type
descriptor"
        exception

  * same as above, but use CasCopier to try and copy the type system
        (and everything else) from the CAS in the AE's process() method
          into the empty CAS

  ->  Attempted to copy a FeatureStructure of type "(my type name)", which is
not defined in the type system of the destination CAS.

It seems the ONLY way to obtain a CAS (empty or otherwise) that has the type
system able
to accept the external CAS being deserialized is to use the very CAS passed
into
the AE's process() method.  Doing so obviously mangles that CAS for the rest
of
the pipeline.


On 3/15/2012 1:50 PM, Marshall Schor wrote:


On 3/15/2012 10:38 AM, Eric Riebling wrote:

I have a pipeline with it's own type system.
I also have deserialized, annotated CASes on disk with a different type
system.
Suppose I want an Analysis Engine in the pipeline to read in the
deserialized
CASes in order to obtain annotations and 'do things with them'

I understand some limitations in the UIMA framework prevent this, but
could it be done by making the first type system include that of the
CASes to deserialize?

Yes, I think so.


Also, it would necessitate creating new CASes within the Analysis Engine.
I could think of several approaches, and have tried some without success:

* Create a new, 'temporary' View in the AE's process() method, obtain a
JCas, obtain it's CAS, and use that to store the deserialized CASes
(seems to mangle the original CAS and break downstream AEs in the
pipeline,
and seems to not be able to find any annotations in the deserialized CAS)

This won't work. The deserialize method effectively "resets" the CAS
before loading it.
A view is not a new CAS; it is a new view of the same CAS.

* Use the CAS in the process() method to store the deserialized CASes
(also mangles the original CAS, breaks downstream AEs, but DOES
permit obtaining annotations from the deserialized CASes)

Right, deserializing into an existing CAS resets it in flight.


* Make the Analysis Engine be a CAS Multiplier, and deserialize into
a CAS created with createEmtpyCas()
(I haven't tried this yet)

Yes, this is the way to get a separate CAS instance to deserialize into.
It's how Collection Readers do it.
-Marshall


It's kind of a use case for a hybrid Component that behaves in some ways
like
an AE (has a process() method), in some ways like XMI Collection Reader,
and
in some ways like a CAS Multiplier.

But it's a useful use case! It is also a very bizarre one becuase you
could
almost think of it as a pipeline within a pipeline, which processes a set
of deserialized annotated XMI documents, within a pipeline that processes
...
in our case, a Question Answering system with question keyterms,
ranked lists of documents and answer candidates.

Re: Getting annotations from CASes 'external' to a pipeline

Reply via email to