Dear All Let say I want to count the occurrences of each word in a document collection and to use these counters (possibly in the same workflow). I am in the situation where I have a CAS per document and I want to scale out the workflow.
To scale out the workflow I use a resource to store the counters of each word. The resource is accessed in writing mode by several instances of an annotator which process in parallel distinct CASes. Here are my questions : * I believe I cannot be sure that when a successive annotator in the same workflow will use the resource, the resource would not still be modified after that (by running counter annotators which are still processing remaining CAS). Right ? In other words, I do not have a way to run (to delay the run of) an annotator depending the state of a resource ? * So, I may use two worflows: one to build the resource, the other one to use it. But how can I export/save the resource ? I cannot access the resource in the collectionProcessComplete method of an AE, can I ? The solution I imagine was inspired of the use of the CAS multiplier to merge CAS. It is to use two workflows with one of them dedicated to build the resource. In this workflow, I define an annotator (without scaling out, so a cas consumer). In that annotator, I check the SourceDocumentInformation Feature Structure in the CAS to see if its lastSegment feature is set to true, in that case I can export the resource. I know this it not a guarantee that all CAS have been processed. I may also have a special counter resource in that annotator to count the processed cas and eventually export the desired resource when all CAS would have been processed. In that case, I would need a way to communicate to the "exporter" annotator the number of CAS which will be processed... This is not the main problem. After writing that, I realize that to do it in a single workflow, I could have written a CAS multiplier to save each CAS until all have been processed, then create again as many CAS as the ones saved... These solutions are very complex... Any suggestions... ? A uimaFIT trick =) ? Thanks for your ideas /Nicolas
