--
David Hancock | [EMAIL PROTECTED] | 410-266-4384
-----Original Message-----
From: Geoffrey Talvola [mailto:[EMAIL PROTECTED]
Sent: Monday, September 29, 2003 2:23 PM
To: [EMAIL PROTECTED]
Cc: Adamshick, Greg (GADAMSHI); Bugenhagen, John (JBUGENHA); Kancianic, Jennifer C. (JKANCIAN); Parangot, Reena M. (RMP); 'Hancock, David (DHANCOCK)'
Subject: RE: [Webware-discuss] Webware/WebKit and load-balancing-----Original Message-----
From: Hancock, David (DHANCOCK) [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 27, 2003 4:37 PM
To: [EMAIL PROTECTED]
Cc: Adamshick, Greg (GADAMSHI); Bugenhagen, John (JBUGENHA); Kancianic, Jennifer C. (JKANCIAN); Parangot, Reena M. (RMP)
Subject: [Webware-discuss] Webware/WebKit and load-balancingHas anyone in Webware-land been successful implementing a load-balancer between Apache and one or more WebKit instances? I've been trying to do this for many weeks without success. I wrote about my problems a while ago, but I still haven't had any luck.
Hmmm. Do you really mean load balancing between a single instance of Apache and multiple WebKit instances? Ordinarily I would assume the load balancer would go _before_ Apache and would load balance between multiple Apache instances, each of which used mod_webkit to talk to their own single instance of WebKit.
>>> Single Apache to multiple WebKits, with multiple Apaches load balanced. The main idea is to retain the current model (load-balanced Apaches, each with their own local WebKit instance), but add the ability for one Apache to contact an alternate WebKit if the local one seems to be down.
Pertinent information: Webware 0.8.1, Python 2.2, RedHat 7.3 (2.4.20something kernel), mod_webkit, DCOracle2 in use, also pymqi (Python binding for MQSeries middleware). We have two web/application servers, each running Apache and each running WebKit. We use a Cisco LocalDirector listening on a virtual IP and load-balancing (and failing out) the Apache servers. Once the LocalDirector binds to the real IP of a web/app server, that server's Apache and WebKit handle the request. Sessions use the File store, and are NFSed so that either server can get a request and handle the session.
I'm not sure if the File store is "process-safe" -- is it possible for 2 processes to step on one another's sessions? That's something you might want to look into. (But probably not related to the wedging you're seeing.)
>>> We haven't seen any issues relating to the File store, or at least not recognized as such. I'll keep this in mind, though--it sounds like it could be heck to troubleshoot. It's nice not having to worry about "affinity" for sessions, though. I benchmarked File vs. Dynamic for the session store, and File on our systems was only about 10% slower. Adding some sort of locking might make that worse, but still shouldn't be too bad.
My main goal is to be able to trap, basically in real time, those cases where WebKit hangs but doesn't die. The LocalDirector does an admirable (albeit expensive) job of handling hardware failure or stopped Apache servers. But the hardware (knock wood) hasn't failed and Apache is rock-solid. But WebKit has on numerous occasions just "stopped." Unfortunately, Apache continues to handle the incoming requests, pesters the dead WebKit port 10 times, and then returns 500 Server Error to the client.
If you're comfortable hacking the C code, you could change mod_webkit's behavior. Its current behavior was intended to allow restarting WebKit without losing requests. You could modify it so that it adds fault recovery -- when it can't contact WebKit, it could attempt to restart WebKit, wait a few seconds, _then_ try again.
Another possibility is to add load balancing and fault tolerance right into mod_webkit -- when it fails to connect to one appserver it could try another.
>>> Both these ideas have a lot of merit. I'm rusty at C, but I'll "C" what I can do.
We've had good success load-balancing some outbound xmlrpc requests using (first) proxylb and then pythondirector. But when I try either of these software load-balancers, I get a 500 Server Error response and "cannot scan servlet headers" in the Apache error log. The mod_webkit.c code shows this error as coming AFTER the request has gone to the WebKit port:
. . .
/* Now we get the response from the AppServer */
ap_hard_timeout("wk_read",r);
// log_message("scanning for headers",r);
//pull out headers
if ((ret=ap_scan_script_header_err_buff(r, buffsocket, NULL))) {
if( ret>=500 || ret < 0) {
log_message("cannot scan servlet headers ", r);
return 2;
}
r->status_line = NULL;
}
. . .Again, is this load balancing between Apache and WebKit, or is it between the web client and Apache? I assume the load balancers were only designed for the latter.
>>> Between Apache and WebKit. The pythondirector and proxylb balancers are for ANY TCP traffic, and we've used them successfully for port 80, 443, other web ports, and I've heard of SMTP, FTP, etc. being load-balanced with them.
Any ideas? I'd be grateful for either (a) which way to go with troubleshooting or (b) pointers to other solutions that have worked for failover. We're already checking for one style of hung WebKit processes and issuing a restart, but that hasn't handled every hang mode we've encountered.
- Geoff
--- Begin Message --- Title: Re: Webware/WebKit and load-balancingYour IP addressing probably has both web servers on the same subnet, but
another approach to trying the LocalDirector could be to have your
traffic goInternet<->LD<->apache1<->LD<->WebKit2 (criss-crossing the web and app
servers)
That could be done with just changing your network architecture without
additional NICs, I think.About the logs, I do remember reading posts that the output is buffered.
Try running your tests, then stopping WebKit (not restarting). The
logs are probably flushed to disk during clean shutdown.Bad marshall data is what you get when you try to connect to the WebKit
app server port directly. Are you or some non-apache process opening
connections to monitor the health of WebKit? I'm sure there are other
reasons this error could be showing up, though.Good luck. I'll be curious to hear what the problem is.
Pete
Hancock, David (DHANCOCK) wrote:
> That's a good idea; I'll give it a try. It may at least help us figure out
> what the problem it. It sounds too convoluted for a long-term solution,
> maybe. I may be able to do this without an additional NIC. (We've got two
> already--we have our database servers on a separate, more "private" LAN.)
>
> The problem we're having is definitely AFTER WebKit has processed the
> request. Here's a little more information: If I run the load-balancer, any
> WebKit page that I attempt to load gives me a 500 Server Error. But if I
> restart WebKit, all the requests show up in the WebKit log, seemingly
> successful (two lines for each, the start time and request and then the
> elapsed time for the request). Interestingly (maybe the logging output is
> buffered), I never see the log lines until I restart WebKit.
>
> Sometimes, but not every time, there's a "bad marshal data" traceback in the
> WebKit log after startup (maybe with the first request), but the subsequent
> requests get logged fine. Is there a clue there? I think that the "bad
> marshal data" sometimes indicates trying to connect directly to the
> AppServer port without packaging up the request.
>
> I also tried this using wkcgi as the adapter instead of mod_webkit and got
> similar results. (Similar in that it didn't work, but got logged as working.
> The error message is solely "premature end of script headers."
>
> Cheers!
> --
> David Hancock | [EMAIL PROTECTED] | 410-266-4384
>
>
> -----Original Message-----
> From: Peter Lyons [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, September 27, 2003 11:57 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Webware/WebKit and load-balancing
>
>
> Is there any possibility of dual-homing (two NICs) your web servers such
> that the apache<->WebKit traffic routes back through the LocalDirector
> and leverages its ability to detect a hung WebWare process? It's just a
> thought, and certainly not a clean solution, but maybe it would work?
>
> Pete
--- End Message ---
