Hi!
I have created an extension of UIMA that replaces its default ASB
with a multi-threaded one, so that if you have a CAS multiplier in your
pipeline, multiple generated CASes may be processed in parallel in
different threads. It has a few warts, but should be generally much
simpler to use than UIMA-AS if you do not need fancy things like cluster
deployment.
It even has some documentation now. Find it at:
https://github.com/brmson/yodaqa/tree/master/src/main/java/cz/brmlab/yodaqa/flow/asb
(Right now, it just lives as part of my YodaQA software, simply copy
that directory to your project. I can spin-off the package properly
if there'll be enough interest in it. It shares the YodaQA licence
statement, i.e. ASL2.)
On Wed, May 20, 2015 at 03:27:20AM +0200, Petr Baudis wrote:
> I'm looking into ways to run a part of my pipeline multi-threaded:
..snip..
> (i) I'm using UIMAfit heavily, and multiple CAS multipliers and
> mergers (even within the parallel branches). So I can't use CPE.
>
> (ii) I need multi-threading, not separate processes. (I have just
> a meager 24G RAM (sigh) and one Java process with all the linguistic
> models and stuff loaded takes 3GB RAM. So I really need to load these
> resources to memory only once.)
..snip..
> However, (before actually trying) it still seems to me to be much
> easier to rewrite a piece of the stock ASB than use UIMA-AS with complex
> pipeline construed by UIMAfit... So I think I will try that first (and
> report back).
Whew, this was not so easy! It took a good few days (and a few
start-overs) to do and debug, and I learnt more about UIMAj internals
than I ever cared to. ;-) But I think I'm still happier with the result
than if I used UIMA-AS and it doesn't seem to deadlock or crash anymore
even on (IMHO) a fairly massive pipeline.
(What I'm bothered by the most at this point is the fixed-size CAS
pool, though there are a few more issues; I tried to document them all
as well.)
P.S.: Would there be any interest in merging this to UIMA proper,
or at least cleaning up some UIMA API bits to simplify and future-proof
the external package? I admit up-front that I probably won't have time
to do all that work myself, but I'd be happy to cooperate with someone.
--
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton