It is more the scenario A) you described.
I have a number of input text files (could be a large number), for each
I have two XMI files that contain the annotation objects, one from our
analysis engine pipeline and the other from the human annotator. So for
each input text file, I will need to load them both in memory and
compare them. My output will be any differences that I can find between
each pair of such XMI files.
Now, the challenge is that both of the XMI files were generated with the
"_InitialView" as the sofa ID. So when I try to load them into two
separate view's underlying CASes, it does not work mostly because of the
identical sofa IDs embedded in the XMI files. Then I try to use one CAS
to deserialize the 2 XMI files sequentially. But I am having trouble
emptying the CAS after loading the first XMI file with CAS.reset() method.
James
On 7/25/2011 11:50 AM, Burn Lewis wrote:
I'm not clear on how you'd like this pipeline to work, e.g. what are the
inputs& outputs to this annotator?
A) Are you feeding it a stream of CASes created from the pairs of XMI files
to be compared? e.g. X1 Y1 X2 Y2 X3 Y3 ...
where X1& Y1 are to be compared, then X2 Y2 etc. If so then your annotator
could extract the information from the 1st and save it locally to compare
with the information in the 2nd member of the pair when it arrives.
B} Or does the input CAS merely contain the names of the two XMI files to be
compared, in which case you should follow Eddie's suggestion and implement
it as a CAS Multiplier so that it can create a couple of empty CASes to
deserialize into.
Since the deserialize calls reconstruct a complete CAS they can only be
applied to empty CASes so are usually made in Collection Readers or CAS
Multipliers.
~Burn