hi,
I'm running collection processing on a quad core with 8G of RAM.
I run the CPE with 4 thread and I tried various sizes for the cas
pool. I patched the CPE so now i can set cas pool size bigger than 60,
but the result is always the same:
The CAS processor pool is empty. (Thread Name: [Procesing Pipeline#4
Thread]::) Total size: 1 Free in pool: 0
This happens with 60, 100, or even 240!
Actually this pipeline is able to analyze 60k document per hour. It's
good, but I hope to reach 100k docs/h. The old one (not UIMA-based)
did 24000 docs/h single thread and I able to run 3 pipeline in
parallel (three processors).
This new pipeline does more work, and I'm able to run 2 pipelines on
different processes to achieve a 90k docs/h. To run 2 pipelines I
should limit treads to 2.
The major limitig factor is the creation of a lot of temporary
objects, so this is the jvm configuration to mitigate this:
-Xmx3072M -XX:NewSize=1024M \
-XX:ParallelGCThreads=4
-XX:+UseParallelOldGC -XX:-UseGCOverheadLimit \
-XX:+DisableExplicitGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps"
I wonder if there's a right cas pool size to increase the analysis speed.
If necessary I can modify queue sizes (work queue, consumer queue).
Any suggestion?
Roberto
--
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:[email protected] skype:ro.franchini