Roberto Franchini wrote: > hi, > I'm running collection processing on a quad core with 8G of RAM. > I run the CPE with 4 thread and I tried various sizes for the cas > pool. I patched the CPE so now i can set cas pool size bigger than 60, > but the result is always the same: > The CAS processor pool is empty. (Thread Name: [Procesing Pipeline#4 > Thread]::) Total size: 1 Free in pool: 0 > > This happens with 60, 100, or even 240! > Actually this pipeline is able to analyze 60k document per hour. It's > good, but I hope to reach 100k docs/h. The old one (not UIMA-based) > did 24000 docs/h single thread and I able to run 3 pipeline in > parallel (three processors). > This new pipeline does more work, and I'm able to run 2 pipelines on > different processes to achieve a 90k docs/h. To run 2 pipelines I > should limit treads to 2. > The major limitig factor is the creation of a lot of temporary > objects, so this is the jvm configuration to mitigate this: > -Xmx3072M -XX:NewSize=1024M \ > -XX:ParallelGCThreads=4 > -XX:+UseParallelOldGC -XX:-UseGCOverheadLimit \ > -XX:+DisableExplicitGC -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps" > > I wonder if there's a right cas pool size to increase the analysis speed. > If necessary I can modify queue sizes (work queue, consumer queue). > Any suggestion? > Roberto
Personally, I have long since given up using the CPE. When I need to scale on a single machine like you do, I do document ingestion outside of UIMA and instantiate my processing pipeline n time myself. The I have an input queue with documents, and do the output from a CAS consumer or such. Each processing pipeline has exactly one CAS. It processes a document, and when done, gets the next one from the input queue. Repeat until done. You need to do a little threading yourself this way, but it's really minimal. This way you can forget about fiddling with CPE parameters. There may be times when this approach is not appropriate, but I have yet to see such a case :-) --Thilo
