Dear All

Let say I want to count the occurrences of each word in a document
collection and to use these counters (possibly in the same workflow).
I am in the situation where I have a CAS per document and I want to
scale out the workflow.

To scale out the workflow I use a resource to store the counters of
each word. The resource is accessed in writing mode by several
instances of an annotator which process in parallel distinct CASes.

Here are my questions :
* I believe I cannot be sure that when a successive annotator in the
same workflow will use the resource, the resource would not still be
modified after that (by running counter annotators which are still
processing remaining CAS). Right ? In other words, I do not have a way
to run (to delay the run of) an annotator depending the state of a
resource ?
* So, I may use two worflows: one to build the resource, the other one
to use it.  But how can I export/save the resource ? I cannot access
the resource in the collectionProcessComplete method of an AE, can I ?

The solution I imagine was inspired of the use of the CAS multiplier
to merge CAS. It is to use two workflows with one of them dedicated to
build the resource. In this workflow, I define an annotator  (without
scaling out, so a cas consumer). In that annotator, I check the
SourceDocumentInformation Feature Structure in the CAS to see if its
lastSegment feature is set to true, in that case I can export the
resource. I know this it not a guarantee that all CAS have been
processed. I may also have a special counter resource in that
annotator to count the processed cas and eventually export the desired
resource when all CAS would have been processed. In that case, I would
need a way to communicate to the "exporter" annotator the number of
CAS which will be processed... This is not the main problem.

After writing that, I realize that to do it in a single workflow, I
could have written a CAS multiplier to save each CAS until all have
been processed, then create again as many CAS as the ones saved...

These solutions are very complex...

Any suggestions... ? A uimaFIT trick =) ?

Thanks for your ideas

/Nicolas

Reply via email to