On 3/15/2012 10:38 AM, Eric Riebling wrote:
I have a pipeline with it's own type system.
I also have deserialized, annotated CASes on disk with a different type system.
Suppose I want an Analysis Engine in the pipeline to read in the deserialized
CASes in order to obtain annotations and 'do things with them'
I understand some limitations in the UIMA framework prevent this, but
could it be done by making the first type system include that of the
CASes to deserialize?
Yes, I think so.
Also, it would necessitate creating new CASes within the Analysis Engine.
I could think of several approaches, and have tried some without success:
* Create a new, 'temporary' View in the AE's process() method, obtain a
JCas, obtain it's CAS, and use that to store the deserialized CASes
(seems to mangle the original CAS and break downstream AEs in the pipeline,
and seems to not be able to find any annotations in the deserialized CAS)
This won't work. The deserialize method effectively "resets" the CAS before
loading it.
A view is not a new CAS; it is a new view of the same CAS.
* Use the CAS in the process() method to store the deserialized CASes
(also mangles the original CAS, breaks downstream AEs, but DOES
permit obtaining annotations from the deserialized CASes)
Right, deserializing into an existing CAS resets it in flight.
* Make the Analysis Engine be a CAS Multiplier, and deserialize into
a CAS created with createEmtpyCas()
(I haven't tried this yet)
Yes, this is the way to get a separate CAS instance to deserialize into. It's
how Collection Readers do it.
-Marshall
It's kind of a use case for a hybrid Component that behaves in some ways like
an AE (has a process() method), in some ways like XMI Collection Reader, and
in some ways like a CAS Multiplier.
But it's a useful use case! It is also a very bizarre one becuase you could
almost think of it as a pipeline within a pipeline, which processes a set
of deserialized annotated XMI documents, within a pipeline that processes ...
in our case, a Question Answering system with question keyterms,
ranked lists of documents and answer candidates.