Cannot deserialize into a CAS from getEmptyCas(). Must use a CAS from CasCreationUtils.createCas for deserialization, and then use casCopier to copy to the CAS from getEmptyCas().
Pick the version of createCas that specifies a typesystem, and use the typesystem from the pipeline CAS (i.e. the one from getEmptyCas). On Thu, Mar 15, 2012 at 2:44 PM, Eric Riebling <[email protected]> wrote: > Thanks, guys. This is getting me closer to the goal, and explains the > observed > behaviors. Now I'm facing issues when implemented as a CAS Multiplier. I > try > creating a new CAS first with getEmptyJCas(). > > Here are some various strategies and what resulted: > > * create a deserializer with the typesystem from the AE (which > includes types in the 'external' CAS to be deserialized) > * ues it to deserialize into the empty CAS created with getEmptyJCas() > > -> The deserialized CAS for some reason has only the base TOP typesystem > -> Trying to access an annotation from an index (that should be there) > generates the "used in Java code, but was not declared in the XML type > descriptor" > exception > > * same as above, but use CasCopier to try and copy the type system > (and everything else) from the CAS in the AE's process() method > into the empty CAS > > -> Attempted to copy a FeatureStructure of type "(my type name)", which is > not defined in the type system of the destination CAS. > > It seems the ONLY way to obtain a CAS (empty or otherwise) that has the type > system able > to accept the external CAS being deserialized is to use the very CAS passed > into > the AE's process() method. Doing so obviously mangles that CAS for the rest > of > the pipeline. > > > On 3/15/2012 1:50 PM, Marshall Schor wrote: >> >> >> On 3/15/2012 10:38 AM, Eric Riebling wrote: >>> >>> I have a pipeline with it's own type system. >>> I also have deserialized, annotated CASes on disk with a different type >>> system. >>> Suppose I want an Analysis Engine in the pipeline to read in the >>> deserialized >>> CASes in order to obtain annotations and 'do things with them' >>> >>> I understand some limitations in the UIMA framework prevent this, but >>> could it be done by making the first type system include that of the >>> CASes to deserialize? >> >> Yes, I think so. >>> >>> >>> Also, it would necessitate creating new CASes within the Analysis Engine. >>> I could think of several approaches, and have tried some without success: >>> >>> * Create a new, 'temporary' View in the AE's process() method, obtain a >>> JCas, obtain it's CAS, and use that to store the deserialized CASes >>> (seems to mangle the original CAS and break downstream AEs in the >>> pipeline, >>> and seems to not be able to find any annotations in the deserialized CAS) >>> >> This won't work. The deserialize method effectively "resets" the CAS >> before loading it. >> A view is not a new CAS; it is a new view of the same CAS. >> >>> * Use the CAS in the process() method to store the deserialized CASes >>> (also mangles the original CAS, breaks downstream AEs, but DOES >>> permit obtaining annotations from the deserialized CASes) >> >> Right, deserializing into an existing CAS resets it in flight. >>> >>> >>> * Make the Analysis Engine be a CAS Multiplier, and deserialize into >>> a CAS created with createEmtpyCas() >>> (I haven't tried this yet) >> >> Yes, this is the way to get a separate CAS instance to deserialize into. >> It's how Collection Readers do it. >> -Marshall >>> >>> >>> It's kind of a use case for a hybrid Component that behaves in some ways >>> like >>> an AE (has a process() method), in some ways like XMI Collection Reader, >>> and >>> in some ways like a CAS Multiplier. >>> >>> But it's a useful use case! It is also a very bizarre one becuase you >>> could >>> almost think of it as a pipeline within a pipeline, which processes a set >>> of deserialized annotated XMI documents, within a pipeline that processes >>> ... >>> in our case, a Question Answering system with question keyterms, >>> ranked lists of documents and answer candidates. >>> >> >
