Hi Frank,
At 11:29 AM 8/29/2007, you wrote:
Is there a reason that the configuration parameters that are being read
by the last annotator can't be an annotation on the given document.
They have nothing to do with the document per se.
For
example, one can imagine a pipeline where there is an Analysis Engine
that checks the language of the document and then a separate
morphological tokenizer creates morpheme annotations using this
information. The natural way to do now would be to set a new
DocumentAnnotation on the document with the appropriate language and
have the tokenizer AE read this.
Yes, in that case, the CAS would be used pretty much as it was
intended. I'm stumbling on the conceptual mismatch between my
configuration variables, and an "annotation."
Does the analysis engine you are
dealing with wrap something that requires an actual configuration file
or can it take for example a string argument instead?
Yes, I'm creating an annotator that wraps a legacy process flow
execution system. The execution system ingests a process flow
description (a graph of work nodes) in XML, and then executes it.
Right now, I have the path to the flow description file specified in
an analysis engine parameter, which has the obvious downside of
requiring a user of the annotator to edit the engine descriptor
whenever they want to change the process flow that will be executed.
I thought that using a DataResource would allow me to store the path
in an external file, which could be easily edited by hand by a user,
and then read in by the DataResource implementation. With UIMA's
support of resource sharing, I thought it would be straightforward to
write to the file, or UIMA's cached version of the file in memory,
for downstream annotators to use. With this approach, I could reuse
my process flow wrapper annotator multiple times within an aggregate
without needing to edit the descriptor.
I was trying to avoid describing all the details, but this should
help you better understand my scenario.
In this case it
might even be best to create the string for the second annotator on the
fly and send it directly rather than writing to the disk somewhere. I
think that if you have access to the code it would be better to treat
everything that changes from document to document as belonging on the
CAS and put all the configuration parameters in the AE descriptor.
Yes, that may be the best approach given the current state of UIMA.
If you have any further thoughts now that I've elaborated on my
problem, I'd love to hear them.
Thanks, Andrew