Eddie Epstein wrote:
I am note sure if I should run async or not. Right now
the analysis is running on one quad core server.
Now I would like to setup UIMA AS in a way that
it uses all the CPU time of all cores for fetching/writing
documents to and from HBase and for analysis.
The interaction with HBase makes the thread idling
for short period of time, thats why I need maybe like
10 threads for fetching and 10 threads for writing
to pump enough documents through the machine
to keep it busy.

Having the AAE async would have the advantage for me
that I only need 10 instances of the fetching CM and 10
instance of the writing delegate AE and not 20 instances
of the whole AAE. The same is true for  analysis there
I can just scale the AEs which are slow.
Though for scaling the CM I have to use the suggested
workaround.

So all in all I think having it async would be an advantage,
but for now it would just be fine to not have it async because
that seems easier.
Assuming that your
AE runs correctly as a single threaded aggregate, creating multiple
instances of this seems fine. The correction to your previous deployment
descriptor would just be:

         <analysisEngine key="TextAnalysis" async="false">
             <scaleout numberOfInstances="8" />
         </analysisEngine>

From UIMA AS point of view, this component is not a CasMultiplier
because [I assume] it comsumes new CASes internally and does not
return them.

Let emphasize that before AS scaleout the aggregate should be tested
as a simple UIMA aggregate with the normal tools like CVD, runAE,
or a custom driver.

I tested the correction but got the first exception again.
Here is now the full stack trace and not only the cause:


Does this error happen right away, or randomly after some period of
processing? Can you confirm that if you run this configuration with
scaleout=1 there is no problem?

Yes with numberOfInstances=1 it works.

Here is the configuration again:
<analysisEngine async="false">
   <scaleout numberOfInstances="1" />
</analysisEngine>

Now changed numberOfInstances to 2.
The first CAS goes through with out an error,
second CAS throws the exception and third goes through
without an error, fourth CAS throws the exception again and then I stopped
debugging. I used the 2.3.0-SNAPSHOT of today for the test.

For me it looks a bit like that one of the two AAE instances works properly.

Jörn


Reply via email to