Yes, the overhead of connecting to a short-run-time remote may be a concern. In the system you describe, and in my assumptions about scalability, it seems like the bulk of a single pipeline would be distributed out to worker nodes in some nested fashion. So, the master CPE might just distribute the CAS to a first-level set of remotes, which may act as distributors to second-level remotes doing the work (or maybe that's the role of the 15*32 threads running in the master CPE ... there isn't a lot of work for them to do other than send out and wait for a response, so you may only need a single-level distributor). The time savings would be in keeping the AEs close to each other and use the architecture to move work out to those nodes. Am I headed in the right direction? So, keeping the current features and adding a once-per-process re-connect for that master CPE to use would be a small step in the right direction. (Unless this line of discussion is moot given the post about UIMA AS). - Charles
> Date: Wed, 30 Apr 2008 09:48:56 -0700> To: [email protected]> > From: [EMAIL PROTECTED]> Subject: RE: Server Socket Timeout Woes> > Thanks > for the kind words - it's all been out of necessity, not some grand > scheme! > I too have thought about the balance load aspects of the 'job > scheduler.' > Even without the ability to add/subtract additional resources > (a nice > feature), it seems that the current setup is missing some other > niceties as > well.> > I find that the in order for all of our nodes to be used, I have to > > 'overshoot' the number of instances I'd really like to process. This is > > because if, say, I had 10 worker nodes, and I started 10 instances, there's > > a good chance some of them will get 2 instances per worker, or more, while > > others would get 0. So I oversaturate the lines and hope for the best.> > I > think, as had been said in this thread, perhaps the best bet would be to > > allow a thread to get a resource simply for the length of a single > > processCas(), then release it back to the pool. I suppose there are some > > overhead issues with this? But at least you wouldn't worry about wasting > so > many threads all of the time. Maybe a few different options, such as > the > current setup, a new thread per processCas, and maybe a way to gain > > priority? So if you're constantly "checking out" the same type of thread, > > you're allowed to hold on to a longer "lease" of that thread, and overhead > > time goes down? Something like DHCP, but for worker threads :) Of course, > > that might be too complicated and not worth the effort.> > It seems like > taking a resource just long enough to perform one block of > work (one > processCas) is the simplest and most 'tried-and-true' > form. However, at > least in most of our work, each processCas is really > pretty quick, so it > would look like a lot of overhead for switching threads > around all of the > time. Of course 'pretty quick' is relative, and in > computer-time is closer > to an eternity. But we're averaging 100s to 1000s > of documents per second, > so if we're ALWAYS setting up and tearing down, > that could eat into out > efficiency.> > These are just some of my thoughts, anyone have any ideas?> > > Steve> > At 10:22 AM 4/29/2008, you wrote:> >I'm excited to see this thread > for it's affirmation that someone has > >pushed Vinci scalability to the > point that Steve has at LLNL. Also, to > >know the currently released version > has some limitations. At the risk of > >diverting this thread, let me share > what we've found.> >> >I'm on board with Adam's line of thinking. We've just > spent 2 weeks > >experimenting with the various options for exclusive/random > allocation of > >Vinci services, finding that 'exclusive' is the most > reliable way to > >balance load (random sometimes hands all of the clients > the same service > >while other services go unused). The phrase "when a > service is needed" > >isn't clear in the documentation. As Adam indicated, > our finding is that > >"need" occurs only at client thread initialization > time as opposed to each > >process(CAS) call. Additionally, "exclusive" is > not exactly clear, as two > >client threads can be handed the same service if > the number of services > >available are less than the number of threads > initializing. This behavior > >is robust (better to get a remote than have > nothing allocated), but it > >isn't clear from our relatively small setup > (two threads, two remotes) > >what the word 'exclusive' means or how large a > system can get before > >'exclusive' pans out as the right/wrong approach.> > >> >In the face of services starting/stopping on remote computers (e.g., > > >during multi-platform reboot), there seems to be no way to robustly take > > >advantage of additional services coming on-line. If "when needed" meant > > >each process(CAS) call (as an option at least ... to trade the re-connect > > >negotiation overhead for dynamic scalability), then a system that > > >initializes to 5 remotes can balance out as 10,20,30 remotes come > >online. > For now, we are using the CPE 'numToProcess' parameter to exit > >the CPE, > then construct a new CPE and re-enter the process() routine to > >seek out > new services periodically.> >Also, we are seeing a startup sequence that > sometimes results in the first > >document sent to each remote returning > immediately with a > >connection/timeout exception ... so we catch those > items and re-submit > >them at the end of the queue in case they really did > exit due to a valid > >timeout exception.> >> >Any feedback/collaboration > would be appreciated.> >> >- Charles> >> >> >> >> > > Date: Wed, 23 Apr 2008 > 17:44:50 -0400> From: [EMAIL PROTECTED]> To: > > > [email protected]> Subject: Re: Server Socket Timeout > > Woes> > > On Wed, Apr 23, 2008 at 4:39 PM, Steve Suppe <[EMAIL PROTECTED]> > > > wrote:> > Hello again,> >> > I think you are 100% right here. I managed > > > to roll back to my patched> > version of UIMA 2.1.0. In this one, I > > > implemented the pool of threads as> > automatically expandable. This > > > seemed to solve all of our problems, and> > things are chugging away very > > > happily now.> >> > I know this is the user group, but is this something I > > > should look to> > contributing somehow?> >> > Definitely - you could open > > > a JIRA issue and attach a patch. We> should probably think a bit about > > > how this thread pool was supposed to> work, though. My first thought is > > > that the clients would round-robin> over the available threads, and each > > > thread would be used for only one> request and would then be relinquished > > > back into the pool. But> instead, it looks like the client holds onto a > > > thread for the entire> time that client is connected, which doesn't make > > > a whole lot of> sense. If the thread pool worked in a more sensible way, > > > it might not> need to be expandable.> > -Adam> > >_________________________________________________________________> >Back to > work after babyhow do you know when you’re ready?> > >http://lifestyle.msn.com/familyandparenting/articleNW.aspx?cp-documentid=5797498&ocid=T067MSN40A0701A > > >> _________________________________________________________________ Spell a grand slam in this game where word skill meets World Series. Get in the game. http://club.live.com/word_slugger.aspx?icid=word_slugger_wlhm_admod_april08
