On 04/12/08 17:45, Mathieu Arnold wrote:
> +-le 04.12.2008 12:42:09 -0500, Milenko a dit :
> |>  -----Original Message-----
> |>  From: [EMAIL PROTECTED] [mailto:tilesathome-
> |>  [EMAIL PROTECTED] On Behalf Of Mathieu Arnold
> |>  Sent: Thursday, December 04, 2008 10:24 AM
> |>  To: 'TilesAtHome'
> |>  Subject: Re: [Tilesathome] Two new ROMA...
> |>
> |>  +-le 04.12.2008 10:12:09 -0500, Milenko a dit :
> |>  | OK - the map.fcgi that I have I just downloaded from florians server,
> |>  so
> |>  | that version does need to be updated.
> |>
> |>  Yes, that should end up somewhere in the svn :-)
> |>
> |>  | Yes I see that.  It's returning results in under a second or two at
> |>  the
> |>  | moment.  Most of the current tiles look empty or pretty sparse
> |>  though.
> |>  | We'll see what happens when missingtiles runs next.
> |>
> |>  The thing is that if possible, the clients that were hitting your
> |>  server
> |>  directly would be really better served by the load balancer, so that
> |>  the load
> |>  balancer does not set your server as being down because the concurrency
> |>  limit
> |>  is reached on your end. (You could also do as I did, set the
> |>  concurrency
> |>  around 20 and let the LB do the job.)
> |>
> |>  --
> |>  Mathieu Arnold
> |
> | Could you drop the LB to something more like 8 instead of 10?  Those extra
> | two requests really slow things down.
>
> It was at 9, it's at 8 now.
>
> | Do you have any idea what causes the large groups on requests all at one
> | time?  I'm seeing a pattern of no new requests for a minute or so and then
> | 4-7 new requests all within a second or two.  Is this by design on the LB?
> | If so, spreading these requests out would probably make all of the ROMA
> | servers more efficient.
>
> Well, it's a bit hard to debug things with not much informations :-)
> It may be because your server is marked down, thus no hit, and comes up, thus
> you getting assigned what's left in the queue you can get :-)
>    

I think that is exactly what is happening.  As soon as the server comes 
back up, the LB will assign it the full 8 requests upto maxconn if there 
is still requests left in the queue. I think there is an option called 
slowstart / warmup or something like that, that allows you to slowly 
ramp up the requests once the server comes back online.  The main 
problem is however that your server keeps on getting marked as down. In 
20 minutes it has been down 19 times. I.e. basically whenever it was 
assigned 9 requests at least one of them failed with 503, triggering the 
server to be marked down. Can you tell how many requests hit your server 
that do not come through the load balancer? Another possibility is that 
there are a few stale pid files left in the directory that map.fcgi uses 
to determine the concurrency.  It would then think there are more 
requests currently ongoing than there actually are. Could you check to 
see if there are any stale pids left in your $stampdir? One reason there 
are stale pid files is that fcgi scripts can get killed if they run to 
long not leaving the script time to clean up after it. On my ubuntu 
system this timeout is only 40 seconds, which is far to short for large 
requests.

Also your check for the stale db does not seem ideally placed. As the 
check is rather late, the load balancer health check requests come back 
claiming everything is fine, but all requests fail with 503. Would it be 
possible to move the db check before the healthcheck bbox check?


Kai



_______________________________________________
Tilesathome mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/tilesathome

Reply via email to