Thanks Adam for the detailed response! The document that stated the number processing pipelines vs. the CAS pool size is on page 44 of the UIMA References (version 2.2.2). Has anyone done any empirical test on what would be the best ratio of # threads to CAS pool size? Or any consideration of how to choose the number of threads in a CPE?
Your comment on CAS Consumer running on a different thread is new to me. I thought that a CAS Consumer is acting like a cas processor and is driven by the same pipeline thread that controls the AEs. If not, it would definitely cause sync problems. Are CAS consumers also multi-threaded? How can I determine or configure how many threads are used to drive CAS consumers? Thanks a lot! Nick -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Adam Lally Sent: Thursday, April 16, 2009 5:04 PM To: [email protected] Subject: Re: Running CPE with multi-threading Hi, On Thu, Apr 16, 2009 at 11:22 AM, Duan, Nick <[email protected]> wrote: > I have a set of annotators bundled as an aggregate AE and configured in > a CPE. It runs fine with a single thread, but deadlocked with 2 or more > threads. The AE was developed without any consideration of > thread-safety. I am trying to find out the possible causes of the > deadlocks, and hope to get answers to the following questions from this > community: > > 1. When running CPE with multiple threads (e.g. multiple pipelines), > does each thread instantiate its own annotator objects or AE instance, > or do all threads share the same instances? If the former is true, I > think I don't have to worry about changing each of the annotators to > make the thread-safe. Each thread instantiates its own AE instance. So you don't have to worry about thread-safety issues within an AE instance, but you still have to worry about thread-safety for any static data that's shared across instance. Try to make sure you don't use any static fields (other than static final Strings or primitive types), and if you do absolutely need a static field, make sure all access to it is synchronized. > 2. What's the relationship between the CAS Pool Size and the number of > threads? The document indicates that the number of the processing > pipelines should be equal to or greater than CAS pool size. I would > think the opposite should be true. In one of the examples bundled with > the UIMA-2.2.2 distribution, the pool size was set to 2 while the number > of pipes was set to 1. > You are right, it sounds like the documentation is wrong. Where in the documentation does it say that? The pool size should be at least as big as the number of threads, or else you would have idle threads. I don't think this would cause a deadlock, though. It is sometimes useful to have 1 more CAS than you have processing threads, if your CAS Consumers (which run in a different thread) could benefit from running concurrently with your Analysis Engines. -Adam This communication, along with any attachments, is covered by federal and state law governing electronic communications and may contain company proprietary and legally privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, use or copying of this message is strictly prohibited. If you have received this in error, please reply immediately to the sender and delete this message. Thank you.
