It might even make sense (I haven't tried this) for a CPE configured to
read from and write back to the same folder of XMI documents, overwriting
the ones used as input with enriched ones, as output.  Especially on a
system with limited space due to a very large corpus size.  So long as
existing annotations are preserved, and CASes reside in memory during
processing, I can't see any reason why this shouldn't work.

Eric Riebling wrote:
It seems to me like a good idea that the default/expected file extensions
read and written by the XMI CR and CAS Consumer should be the same.
As things stand, the CAS Consumer writes files like "doc0, doc1, doc2..."
but the Collection Reader ignores them because they don't end have
extensions.  ("doc0.xmi, doc1.xmi, doc2.xmi...")

I'm not sure if this is by design, for a reason, or an oversight.  I know
it's very useful to add a new annotation to a set of already-annotated documents
saved in XMI format, especially
if the annotations they contain took a lot of time to produce, but it
requires one to go through the extra step of renaming them.


--
Eric Riebling  GHC 6713,  LTI,   SCS,  CMU
412.268.9872   http://www.cs.cmu.edu/~er1k

Reply via email to