Roberto Franchini wrote:
> hi,
> I'm running collection processing on a quad core with 8G of RAM.
> I run the CPE with 4 thread and I tried various sizes for the cas
> pool. I patched the CPE so now i can set cas pool size bigger than 60,
> but the result is always the same:
> The CAS processor pool is empty. (Thread Name: [Procesing Pipeline#4
> Thread]::) Total size: 1 Free in pool: 0
> 
> This happens with 60, 100, or even 240!
> Actually this pipeline is able to analyze 60k document per hour. It's
> good, but I hope to reach 100k docs/h. The old one (not UIMA-based)
> did 24000 docs/h single thread and I able to run 3 pipeline in
> parallel (three processors).
> This new pipeline does more work, and I'm able to run 2 pipelines on
> different processes to achieve a 90k docs/h. To run 2 pipelines I
> should limit treads to 2.
> The major limitig factor is the creation of a lot of temporary
> objects, so this is the jvm configuration to mitigate this:
> -Xmx3072M  -XX:NewSize=1024M \
>                           -XX:ParallelGCThreads=4
> -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit  \
>                          -XX:+DisableExplicitGC  -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps"
> 
> I wonder if there's a right cas pool size to increase the analysis speed.
> If necessary I can modify queue sizes (work queue, consumer queue).
> Any suggestion?
> Roberto

Personally, I have long since given up using the CPE.  When I
need to scale on a single machine like you do, I do document
ingestion outside of UIMA and instantiate my processing
pipeline n time myself.  The I have an input queue with documents,
and do the output from a CAS consumer or such.  Each processing
pipeline has exactly one CAS.  It processes a document, and when
done, gets the next one from the input queue.  Repeat until done.
You need to do a little threading yourself this way, but it's
really minimal.  This way you can forget about fiddling with
CPE parameters.

There may be times when this approach is not appropriate, but
I have yet to see such a case :-)

--Thilo

Reply via email to