On 6/4/07, KANO, Yoshinobu <[EMAIL PROTECTED]> wrote:
# org.apache.uima.flow.ParallelStep is not included in the Apache UIMA 2.1.0 release, is it?
No, it was added after the release was cut. It will be in 2.2.
Let me explain my problem again, a bit more precisely. My propose is both the latency and the throughput. To reduce the latency, we need the ParallelStep with real concurrent (multi-threaded) process, as you wrote. In this case, our purpose is a sort of demonstrations. I understood the current implementation by your explanation and it is enough for me about the latency problem. I will just wait for the real concurrent implementation to be done someday. About the throughput issue, please assume that we have a multi-core/CPU machine or remote machines as web services. a. When the resource is multi-core/CPU/node, does Apache UIMA Flow make a new thread for each AnalysisEngine? Or always a single thread for an entire work flow?
If you are using a Collection Processing Engine, you specify the number of processing pipelines in the CPE Descriptor's "processingUnitThreadCount" attribute. This lets you utilize your multiple cores. If you are not using a Collection Processing Engine, see (b) below.
b. There is a class named "MultiprocessingAnalysisEngine_impl". Does it mean that it can start processing another CAS before finishing a previous CAS? In other words, does it mean that this AnalysisEngine is multi-threaded and can process two or more CASes simultaneously?
The MultiprocessingAnalysisEngine_impl internally keeps a pool of AEs, each of which processes one CAS at a time. Therefore the whole MultiprocesingAnalysisEngine can process multiplie CASes at the same time. Note that you don't construct this class directly - instead call UIMAFramework.produceAnalysisEngine and pass the optional argument that says how many concurrent requests you need to be able to process. See "Multi-threaded Applications" in the UIMA Tutorials and Users Guides book for more information. -Adam
