RE: Server Socket Timeout Woes

Charles Proefrock Wed, 30 Apr 2008 13:54:21 -0700

Yes, the overhead of connecting to a short-run-time remote may be a concern.  
In the system you describe, and in my assumptions about scalability, it seems 
like the bulk of a single pipeline would be distributed out to worker nodes in 
some nested fashion.  So, the master CPE might just distribute the CAS to a 
first-level set of remotes, which may act as distributors to second-level 
remotes doing the work (or maybe that's the role of the 15*32 threads running 
in the master CPE ... there isn't a lot of work for them to do other than send 
out and wait for a response, so you may only need a single-level distributor).  
The time savings would be in keeping the AEs close to each other and use the 
architecture to move work out to those nodes.  Am I headed in the right 
direction?
So, keeping the current features and adding a once-per-process re-connect for 
that master CPE to use would be a small step in the right direction.  (Unless 
this line of discussion is moot given the post about UIMA AS).
 
- Charles




> Date: Wed, 30 Apr 2008 09:48:56 -0700> To: [email protected]> 
> From: [EMAIL PROTECTED]> Subject: RE: Server Socket Timeout Woes> > Thanks 
> for the kind words - it's all been out of necessity, not some grand > scheme! 
> I too have thought about the balance load aspects of the 'job > scheduler.' 
> Even without the ability to add/subtract additional resources > (a nice 
> feature), it seems that the current setup is missing some other > niceties as 
> well.> > I find that the in order for all of our nodes to be used, I have to 
> > 'overshoot' the number of instances I'd really like to process. This is > 
> because if, say, I had 10 worker nodes, and I started 10 instances, there's > 
> a good chance some of them will get 2 instances per worker, or more, while > 
> others would get 0. So I oversaturate the lines and hope for the best.> > I 
> think, as had been said in this thread, perhaps the best bet would be to > 
> allow a thread to get a resource simply for the length of a single > 
> processCas(), then release it back to the pool. I suppose there are some > 
> overhead issues with this? But at least you wouldn't worry about wasting > so 
> many threads all of the time. Maybe a few different options, such as > the 
> current setup, a new thread per processCas, and maybe a way to gain > 
> priority? So if you're constantly "checking out" the same type of thread, > 
> you're allowed to hold on to a longer "lease" of that thread, and overhead > 
> time goes down? Something like DHCP, but for worker threads :) Of course, > 
> that might be too complicated and not worth the effort.> > It seems like 
> taking a resource just long enough to perform one block of > work (one 
> processCas) is the simplest and most 'tried-and-true' > form. However, at 
> least in most of our work, each processCas is really > pretty quick, so it 
> would look like a lot of overhead for switching threads > around all of the 
> time. Of course 'pretty quick' is relative, and in > computer-time is closer 
> to an eternity. But we're averaging 100s to 1000s > of documents per second, 
> so if we're ALWAYS setting up and tearing down, > that could eat into out 
> efficiency.> > These are just some of my thoughts, anyone have any ideas?> > 
> Steve> > At 10:22 AM 4/29/2008, you wrote:> >I'm excited to see this thread 
> for it's affirmation that someone has > >pushed Vinci scalability to the 
> point that Steve has at LLNL. Also, to > >know the currently released version 
> has some limitations. At the risk of > >diverting this thread, let me share 
> what we've found.> >> >I'm on board with Adam's line of thinking. We've just 
> spent 2 weeks > >experimenting with the various options for exclusive/random 
> allocation of > >Vinci services, finding that 'exclusive' is the most 
> reliable way to > >balance load (random sometimes hands all of the clients 
> the same service > >while other services go unused). The phrase "when a 
> service is needed" > >isn't clear in the documentation. As Adam indicated, 
> our finding is that > >"need" occurs only at client thread initialization 
> time as opposed to each > >process(CAS) call. Additionally, "exclusive" is 
> not exactly clear, as two > >client threads can be handed the same service if 
> the number of services > >available are less than the number of threads 
> initializing. This behavior > >is robust (better to get a remote than have 
> nothing allocated), but it > >isn't clear from our relatively small setup 
> (two threads, two remotes) > >what the word 'exclusive' means or how large a 
> system can get before > >'exclusive' pans out as the right/wrong approach.> 
> >> >In the face of services starting/stopping on remote computers (e.g., > 
> >during multi-platform reboot), there seems to be no way to robustly take > 
> >advantage of additional services coming on-line. If "when needed" meant > 
> >each process(CAS) call (as an option at least ... to trade the re-connect > 
> >negotiation overhead for dynamic scalability), then a system that > 
> >initializes to 5 remotes can balance out as 10,20,30 remotes come > >online. 
> For now, we are using the CPE 'numToProcess' parameter to exit > >the CPE, 
> then construct a new CPE and re-enter the process() routine to > >seek out 
> new services periodically.> >Also, we are seeing a startup sequence that 
> sometimes results in the first > >document sent to each remote returning 
> immediately with a > >connection/timeout exception ... so we catch those 
> items and re-submit > >them at the end of the queue in case they really did 
> exit due to a valid > >timeout exception.> >> >Any feedback/collaboration 
> would be appreciated.> >> >- Charles> >> >> >> >> > > Date: Wed, 23 Apr 2008 
> 17:44:50 -0400> From: [EMAIL PROTECTED]> To: > > 
> [email protected]> Subject: Re: Server Socket Timeout > > Woes> 
> > On Wed, Apr 23, 2008 at 4:39 PM, Steve Suppe <[EMAIL PROTECTED]> > > 
> wrote:> > Hello again,> >> > I think you are 100% right here. I managed > > 
> to roll back to my patched> > version of UIMA 2.1.0. In this one, I > > 
> implemented the pool of threads as> > automatically expandable. This > > 
> seemed to solve all of our problems, and> > things are chugging away very > > 
> happily now.> >> > I know this is the user group, but is this something I > > 
> should look to> > contributing somehow?> >> > Definitely - you could open > > 
> a JIRA issue and attach a patch. We> should probably think a bit about > > 
> how this thread pool was supposed to> work, though. My first thought is > > 
> that the clients would round-robin> over the available threads, and each > > 
> thread would be used for only one> request and would then be relinquished > > 
> back into the pool. But> instead, it looks like the client holds onto a > > 
> thread for the entire> time that client is connected, which doesn't make > > 
> a whole lot of> sense. If the thread pool worked in a more sensible way, > > 
> it might not> need to be expandable.> > -Adam> 
> >_________________________________________________________________> >Back to 
> work after babyhow do you know when you’re ready?> 
> >http://lifestyle.msn.com/familyandparenting/articleNW.aspx?cp-documentid=5797498&ocid=T067MSN40A0701A
>  > >> 
_________________________________________________________________
Spell a grand slam in this game where word skill meets World Series. Get in the 
game.
http://club.live.com/word_slugger.aspx?icid=word_slugger_wlhm_admod_april08

RE: Server Socket Timeout Woes

Reply via email to