Re: XmiCasDeserializer problem with multi-views

James Jichun Zhu Mon, 25 Jul 2011 12:25:27 -0700

It is more the scenario A) you described.

I have a number of input text files (could be a large number), for eachI have two XMI files that contain the annotation objects, one from ouranalysis engine pipeline and the other from the human annotator. So foreach input text file, I will need to load them both in memory andcompare them. My output will be any differences that I can find betweeneach pair of such XMI files.

Now, the challenge is that both of the XMI files were generated with the"_InitialView" as the sofa ID. So when I try to load them into twoseparate view's underlying CASes, it does not work mostly because of theidentical sofa IDs embedded in the XMI files. Then I try to use one CASto deserialize the 2 XMI files sequentially. But I am having troubleemptying the CAS after loading the first XMI file with CAS.reset() method.


James


On 7/25/2011 11:50 AM, Burn Lewis wrote:

I'm not clear on how you'd like this pipeline to work, e.g. what are the
inputs&  outputs to this annotator?

A) Are you feeding it a stream of CASes created from the pairs of XMI files
to be compared? e.g. X1 Y1 X2 Y2 X3 Y3 ...
where X1&  Y1 are to be compared, then X2 Y2 etc.  If so then your annotator
could extract the information from the 1st and save it locally to compare
with the information in the 2nd member of the pair when it arrives.

B} Or does the input CAS merely contain the names of the two XMI files to be
compared, in which case you should follow Eddie's suggestion and implement
it as a CAS Multiplier so that it can create a couple of empty CASes to
deserialize into.

Since the deserialize calls reconstruct a complete CAS they can only be
applied to empty CASes so are usually made in Collection Readers or CAS
Multipliers.

~Burn

Re: XmiCasDeserializer problem with multi-views

Reply via email to