Hi Roberto,

Generally only a limited number of CASes are required to maximize CPU
utilization. The CPE runs the collection reader in one thread, the CAS
consumers in another, and then each instance of the processing
pipeline is also run in its own separate thread. For a scenario with 4
processing pipelines the pool size needed to keep all threads active
is at least 6.

If the pool size is bigger than the number of working threads, the
collection reader will be called to fill up the extra CASes and they
will wait in a queue for the next processing pipeline to pick them up.
After the CAS consumer thread is done with a CAS it is released back
into the pool.

The error message suggests that some component in the processing
pipeline thread is asking for a CAS. This would be incorrect. My
advice would be to reduce the pool size to 6 and turn up UIMA logging
to FINEST to see more details on what is happening.

Regards,
Eddie

On Mon, Jan 26, 2009 at 6:46 PM, Roberto Franchini
<[email protected]> wrote:
> hi,
> I'm running collection processing on a quad core with 8G of RAM.
> I run the CPE with 4 thread and I tried various sizes for the cas
> pool. I patched the CPE so now i can set cas pool size bigger than 60,
> but the result is always the same:
> The CAS processor pool is empty. (Thread Name: [Procesing Pipeline#4
> Thread]::) Total size: 1 Free in pool: 0
>
> This happens with 60, 100, or even 240!
> Actually this pipeline is able to analyze 60k document per hour. It's
> good, but I hope to reach 100k docs/h. The old one (not UIMA-based)
> did 24000 docs/h single thread and I able to run 3 pipeline in
> parallel (three processors).
> This new pipeline does more work, and I'm able to run 2 pipelines on
> different processes to achieve a 90k docs/h. To run 2 pipelines I
> should limit treads to 2.
> The major limitig factor is the creation of a lot of temporary
> objects, so this is the jvm configuration to mitigate this:
> -Xmx3072M  -XX:NewSize=1024M \
>                          -XX:ParallelGCThreads=4
> -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit  \
>                         -XX:+DisableExplicitGC  -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps"
>
> I wonder if there's a right cas pool size to increase the analysis speed.
> If necessary I can modify queue sizes (work queue, consumer queue).
> Any suggestion?
> Roberto
> --
> Roberto Franchini
> http://www.celi.it
> http://www.blogmeter.it
> http://www.memesphere.it
> Tel +39-011-6600814
> jabber:[email protected] skype:ro.franchini
>

Reply via email to