Re: Run an analysis engine after processing document collection?

Benedict Holland Sun, 24 Dec 2017 22:24:08 -0800

Actually, that was something that ended up being the solution. I checked it
the collection engine was finished and slept for 5 seconds. Once it was
finished, I was able to do another analysis. It works so well that I am
chaining together engines over collections and processing individual steps.


Thanks for the suggestion!

On Sat, Dec 23, 2017 at 1:22 PM, Jens Grivolla <[email protected]> wrote:

> Hi Ben,
>
> if I understand correctly you want to run a process once the whole
> collection has been analyzed. You can have an AnalysisEngine that does this
> by implementing
> http://uima.apache.org/d/uimaj-2.10.0/apidocs/org/
> apache/uima/analysis_engine/AnalysisEngine.html#
> collectionProcessComplete()
>
> You just need to make sure that you gather all the necessary information
> somehow. If the AE that calculates the statistics is at the end of the
> pipeline and you have only one instance of it it's easy to gather all the
> information there. Or you could just write everything you need to a
> centralized datastore (i.e. a database) and use that to calculate the
> statistics.
>
> If I didn't misunderstand you, that's really a quite common scenario.
>
> Best,
> Jens
>
> On Fri, Dec 22, 2017 at 6:26 PM, Benedict Holland <
> [email protected]> wrote:
>
> > Hello All,
> >
> > I find myself in a strange situation. I have a content processing engine
> > working. I have N threads populating N CAS objects and running my
> pipeline.
> > Each CAS object gets 1 piece of data, like say a row in a database. Each
> > process is entirely independent and can run concurrently. I specifically
> > did not configure this pipeline as an aggregate process as I don't really
> > care when the events trigger since the CPE maintains the order of the
> > engines.
> >
> > Now I want to add an analysis that will run over the aggregate output.
> For
> > example, I processed N texts using the CPE and now I want to run a TF-IDF
> > analysis over the entire corpora. The TF-IDF analysis should only run
> once
> > all documents are processed.
> >
> > How would I go about doing this? Does this have to do with not allowing
> > multiple deployments?
> >
> > Thanks,
> > ~Ben
> >
>

Re: Run an analysis engine after processing document collection?

Reply via email to