Re: XMI Collection Reader vs. CAS Consumer

Eric Riebling Wed, 09 Jun 2010 10:46:17 -0700

It might even make sense (I haven't tried this) for a CPE configured to
read from and write back to the same folder of XMI documents, overwriting
the ones used as input with enriched ones, as output.  Especially on a
system with limited space due to a very large corpus size.  So long as
existing annotations are preserved, and CASes reside in memory during
processing, I can't see any reason why this shouldn't work.


Eric Riebling wrote:

It seems to me like a good idea that the default/expected file extensions
read and written by the XMI CR and CAS Consumer should be the same.
As things stand, the CAS Consumer writes files like "doc0, doc1, doc2..."
but the Collection Reader ignores them because they don't end have
extensions.  ("doc0.xmi, doc1.xmi, doc2.xmi...")

I'm not sure if this is by design, for a reason, or an oversight.  I know

it's very useful to add a new annotation to a set of already-annotateddocuments

saved in XMI format, especially
if the annotations they contain took a lot of time to produce, but it
requires one to go through the extra step of renaming them.


--
Eric Riebling  GHC 6713,  LTI,   SCS,  CMU
412.268.9872   http://www.cs.cmu.edu/~er1k

Re: XMI Collection Reader vs. CAS Consumer

Reply via email to