Re: varnish 2.0.4 backend errors

2009-07-16 Thread Lazy
2009/7/16 Lazy :
> 2009/7/16 Ken Brownfield :
>> Yeah, this one might be one for the Varnish devs.  Only last thing I can
>> think of is that Apache is returning a response that's larger than
>> Content-Length: and Varnish is just closing the connection...
> apache didn't set Content-Length on thiese requests
>
> Yesterday i switched the machine to squid, and errors came back as
> httpAppendBody: Request not yet fully sent "POST in squid cache log
>
> Finally I realized that there are many tcp checksum errors in the dump,
> I turned off tx and rx checksum offloading and now the checksums are ok,
> there was so far no error in 25 minutes (before it was happening in about
> 15 minute intervals). Maybe the fact that local backend is bound to eth0's ip
> address made a difference. I will investigate later. Maybe it will be
> worth mentioning
> in the wiki if it's proven.
>
> After all it wasn't a varnish error at all.
>
> Thank You all for your time and expertise.

Arghh said too soon, traffic went up and there are still thiese
errors, but no so many
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-16 Thread Lazy
2009/7/16 Ken Brownfield :
> Yeah, this one might be one for the Varnish devs.  Only last thing I can
> think of is that Apache is returning a response that's larger than
> Content-Length: and Varnish is just closing the connection...
apache didn't set Content-Length on thiese requests

Yesterday i switched the machine to squid, and errors came back as
httpAppendBody: Request not yet fully sent "POST in squid cache log

Finally I realized that there are many tcp checksum errors in the dump,
I turned off tx and rx checksum offloading and now the checksums are ok,
there was so far no error in 25 minutes (before it was happening in about
15 minute intervals). Maybe the fact that local backend is bound to eth0's ip
address made a difference. I will investigate later. Maybe it will be
worth mentioning
in the wiki if it's proven.

After all it wasn't a varnish error at all.

Thank You all for your time and expertise.

--
Michal Grzedzicki
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Fwd: varnish 2.0.4 backend errors

2009-07-15 Thread Kristian Lyngstol
On Wed, Jul 15, 2009 at 12:14:21PM +0200, Lazy wrote:
> 2009/7/15 Kristian Lyngstol :
> > You could try to turn on vcl-trace. (-p vcl_trace=on).
> 
> right now i have something 100-200 req/s won't it kill the server and
> I can't restart varnish now so I will have try to enable it at runtime

Ok, I'm not entirely sure if that'll work, but it's worth a try (and it
should cause any damage anyway).

> 
> >
> > Also, could you post your VCL in it's entirety? (If you already did, I
> > must've missed it.)
> 
> sub vcl_recv {

(...)

> if (req.http.Cache-Control ~ "no-cache") {
>  pass;

You probably do not want this; It will mean that every time a _client_
sends no-cache, it's a pass. That'll be quite often (on refresh, for
instance), and the refreshed item wont get cached, so it wont refresh the
cache.

(I'll have a look at the cap when I find some time. Could take a few
hours).

-- 
Kristian Lyngstøl
Redpill Linpro AS
Tlf: +47 21544179
Mob: +47 99014497


pgpH7KLi3pVfs.pgp
Description: PGP signature
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Fwd: varnish 2.0.4 backend errors

2009-07-15 Thread Lazy
2009/7/15 Kristian Lyngstol :
> On Wed, Jul 15, 2009 at 10:01:08AM +0200, Lazy wrote:
> (...)
>> setting 500 as min threads didn't make any diference, funny thing is
>> that failed requests are in logged in apache as succesful
>
> Sorry if I didn't make myself clear: The output from varnishstat didn't
> indicate that threads was the issue causing 503s. It was more of a
> tuning-comment than a shot at solving the real problem. Sorry about the
> confusion.
>
>> 09:23:50.563934 IP x.x.x.x.51235 > x.x.x.x.88: S
>> 1066103134:1066103134(0) win 32792 > 552835403 0,nop,wscale 7>
>
> ()
>
> Is it possible for you to mail me the raw cap-file? (Feel free to obfuscate
> the IPs, but I'm really more of an ethereal-man than tcpdump-output...).
> Feel free to drop it directly to me, if that makes it easier.
ok, it will take a minute i have a 50MB file to chop

>
>> successful POST in apache logs,
>>
>> in varnishlog
>>
>> 14 Backend      c 15 default default
>
> (...)
>
> The request must've started earlier, as this is just the backend request.
> Do you have the entire transaction available?
yes

   14 VCL_call c error
   14 VCL_return   c restart
   14 VCL_call c recv
   14 VCL_return   c pass
   14 VCL_call c pass
   14 VCL_return   c pass
   15 BackendOpen  b default xxx 51219 xxx 88
   14 Backend  c 15 default default
   15 TxRequestb POST
   15 TxURLb /quiz/question
   15 TxProtocol   b HTTP/1.1
   15 TxHeader b x-requested-with: XMLHttpRequest
   15 TxHeader b Accept-Language: pl
   15 TxHeader b Accept: text/html, */*
   15 TxHeader b Content-Type: application/x-www-form-urlencoded
   15 TxHeader b UA-CPU: x86
   15 TxHeader b Accept-Encoding: gzip, deflate
   15 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
   15 TxHeader b X-Varnish: 782633863
   15 TxHeader b X-Forwarded-For: 82.177.
   15 BackendClose b default
   14 VCL_call c error
   14 VCL_return   c restart
   14 VCL_call c recv
   14 VCL_return   c pass
   14 VCL_call c pass
   14 VCL_return   c pass
   15 BackendOpen  b default xxx 51235 xxx 88
   14 Backend  c 15 default default

>
> (...)
>
>> restart shouldn't make another backend connection if current one got closed ?
>
> It should, but it should be noted that: 1. We don't know _why_ it fails. 2.
> Restart in vcl_error is brand new, so there might still be some issues.
>
> You could try to turn on vcl-trace. (-p vcl_trace=on).

right now i have something 100-200 req/s won't it kill the server and
I can't restart varnish now so I will have try to enable it at runtime

>
> Also, could you post your VCL in it's entirety? (If you already did, I
> must've missed it.)

backend default {
.host = "";
.port = "88";
}


sub vcl_recv {
if (req.url ~ "\.(png|gif|jpg|swf|css|js|ico)$") {
lookup;
}
if (req.url ~ "landing/") {
lookup;
}

if (req.http.Cache-Control ~ "no-cache") {
 pass;
}
}

sub vcl_fetch {
if (req.url ~ "\.(png|gif|jpg|swf)$") {
unset obj.http.set-cookie;
}

if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache") {
 pass;
}
}


sub vcl_error {
if (req.restarts < 30) {
restart;
}
}
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: Fwd: varnish 2.0.4 backend errors

2009-07-15 Thread Kristian Lyngstol
On Wed, Jul 15, 2009 at 10:01:08AM +0200, Lazy wrote:
(...)
> setting 500 as min threads didn't make any diference, funny thing is
> that failed requests are in logged in apache as succesful

Sorry if I didn't make myself clear: The output from varnishstat didn't
indicate that threads was the issue causing 503s. It was more of a
tuning-comment than a shot at solving the real problem. Sorry about the
confusion.

> 09:23:50.563934 IP x.x.x.x.51235 > x.x.x.x.88: S
> 1066103134:1066103134(0) win 32792  552835403 0,nop,wscale 7>

()

Is it possible for you to mail me the raw cap-file? (Feel free to obfuscate
the IPs, but I'm really more of an ethereal-man than tcpdump-output...).
Feel free to drop it directly to me, if that makes it easier. 

> successful POST in apache logs,
> 
> in varnishlog
> 
> 14 Backend      c 15 default default

(...)

The request must've started earlier, as this is just the backend request.
Do you have the entire transaction available? 

(...)

> restart shouldn't make another backend connection if current one got closed ?

It should, but it should be noted that: 1. We don't know _why_ it fails. 2.
Restart in vcl_error is brand new, so there might still be some issues.

You could try to turn on vcl-trace. (-p vcl_trace=on).

Also, could you post your VCL in it's entirety? (If you already did, I
must've missed it.)


-- 
Kristian Lyngstøl
Redpill Linpro AS
Tlf: +47 21544179
Mob: +47 99014497


pgp6OC1UYDneP.pgp
Description: PGP signature
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Fwd: varnish 2.0.4 backend errors

2009-07-15 Thread Lazy
2009/7/15 Lazy :
> 2009/7/14 Ken Brownfield :
>> On Jul 14, 2009, at 3:05 AM, Kristian Lyngstol wrote:
>>> On Tue, Jul 14, 2009 at 11:46:58AM +0200, Lazy wrote:
 the site is usually not so busy, but it has sometimes spikes of
 static
 traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
 to low
>>>
>>> I seriously doubt 3k was too low. More likely, the min threads was
>>> hurting
>>> you. Three thousand threads is quite a bit. Remember that these are
>>> actual
>>> requests being handled, not keep-alive connections and the like.
>>
>> I just wanted to humbly second this good advice; if you're familiar
>> with Apache, this is akin to making sure your MinSpareServers is set
>> to a high enough level to handle any transient spikes by avoiding the
>> cost of spawning new processes.  Varnish will handle 10x the traffic
>> you're seeing in <64 threads.  Anything you're seeing is more likely a
>
> but if backend is slow, (there will be many POSTS form ajax app)
> threads wil get used fast if php starts lagging behind,
> now there are 200 threads running, without raises in overflowed work
> requests, and still there are 503 errors,
> i will uppp min threads later and see if it helps
>
>> concurrency spike causing a temporary slowdown while threads are
>> spawned, or Varnish is simply passing on fail or slowness from your
>> back-end.
> i hope this is it, i will find out tommorrow

setting 500 as min threads didn't make any diference, funny thing is
that failed requests are in logged in apache as succesful

09:23:50.563934 IP x.x.x.x.51235 > x.x.x.x.88: S
1066103134:1066103134(0) win 32792 
09:23:50.563940 IP x.x.x.x.88 > x.x.x.x.51235: S
1060616380:1060616380(0) ack 1066103135 win 32768 
09:23:50.563946 IP x.x.x.x.51235 > x.x.x.x.88: . ack 1 win 257

09:23:54.163755 IP x.x.x.x.88 > x.x.x.x.51235: S
1060616380:1060616380(0) ack 1066103135 win 32768 
09:23:54.163765 IP x.x.x.x.51235 > x.x.x.x.88: . ack 1 win 257

09:23:55.563738 IP x.x.x.x.51235 > x.x.x.x.88: P 1:886(885) ack 1 win
257 
09:23:55.563756 IP x.x.x.x.88 > x.x.x.x.51235: . ack 886 win 270

09:23:55.563838 IP x.x.x.x.51235 > x.x.x.x.88: F 886:886(0) ack 1 win
257 
09:23:55.567177 IP x.x.x.x.88 > x.x.x.x.51235: P 1:2882(2881) ack 887
win 270 
09:23:55.567196 IP x.x.x.x.51235 > x.x.x.x.88: R 1066104021:1066104021(0) win 0


successful POST in apache logs,

in varnishlog

14 Backend      c 15 default default
  15 TxRequest    b POST
  15 TxURL        b /quiz/question
  15 TxProtocol   b HTTP/1.1
  15 TxHeader     b x-requested-with: XMLHttpRequest
  15 TxHeader     b Accept-Language: pl
  15 TxHeader     b Accept: text/html, */*
  15 TxHeader     b Content-Type: application/x-www-form-urlencoded
  15 TxHeader     b UA-CPU: x86
  15 TxHeader     b Accept-Encoding: gzip, deflate
  15 TxHeader     b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
  15 TxHeader     b Host:
  15 TxHeader     b Content-Length: 8
  15 TxHeader     b Cookie: xxx
  15 TxHeader     b X-Varnish: 782633863
  15 TxHeader     b X-Forwarded-For: xxx
  15 BackendClose b default
  14 VCL_call     c error
  14 VCL_return   c restart
  14 VCL_call     c recv
  14 VCL_return   c pass
  14 VCL_call     c error
  14 VCL_return   c restart
  14 VCL_call     c recv
...
  14 Length       c 465
  14 VCL_call     c deliver
  14 VCL_return   c deliver
  14 TxProtocol   c HTTP/1.1
  14 TxStatus     c 503
  14 TxResponse   c Service Unavailable
  14 TxHeader     c Server: Varnish
  14 TxHeader     c Retry-After: 0
  14 TxHeader     c Content-Type: text/html; charset=utf-8
  14 TxHeader     c Content-Length: 465
  14 TxHeader     c Date: Wed, 15 Jul 2009 07:23:55 GMT
  14 TxHeader     c X-Varnish: 782633863
  14 TxHeader     c Age: 20
  14 TxHeader     c Via: 1.1 varnish
  14 TxHeader     c Connection: close
  14 ReqEnd       c 782633863 1247642615.566063166
1247642635.564052582 0.006083488 19.997969866 0.19550
  14 SessionClose c error

restart shouldn't make another backend connection if current one got closed ?

a can't find any subsequent retries in tcpdump

and sometime's restart works

--
Michal Grzedzicki
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-14 Thread Ken Brownfield
On Jul 14, 2009, at 3:05 AM, Kristian Lyngstol wrote:
> On Tue, Jul 14, 2009 at 11:46:58AM +0200, Lazy wrote:
>> the site is usually not so busy, but it has sometimes spikes of  
>> static
>> traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
>> to low
>
> I seriously doubt 3k was too low. More likely, the min threads was  
> hurting
> you. Three thousand threads is quite a bit. Remember that these are  
> actual
> requests being handled, not keep-alive connections and the like.

I just wanted to humbly second this good advice; if you're familiar  
with Apache, this is akin to making sure your MinSpareServers is set  
to a high enough level to handle any transient spikes by avoiding the  
cost of spawning new processes.  Varnish will handle 10x the traffic  
you're seeing in <64 threads.  Anything you're seeing is more likely a  
concurrency spike causing a temporary slowdown while threads are  
spawned, or Varnish is simply passing on fail or slowness from your  
back-end.

I'm wondering if this could also be a large object that's taking a  
while to cache and blocking other children for a while? (rush_exponent)
-- 
Ken.
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-14 Thread Kristian Lyngstol
On Tue, Jul 14, 2009 at 11:46:58AM +0200, Lazy wrote:
> 2009/7/14 Kristian Lyngstol :
> > 6000 threads is too much. Since it's per pool, it'll cause up to 12 000
> > threads to start. That's not likely to go over all that well. If you have
> > that sort of traffic, you need to scale out. Also, 10 thread minimum is
> > pretty low.
> >
> > I typically recommend setting the minimum thread count to what you expect
> > your normal traffic to be at peak hours. It's probably a dedicated
> > machines, and idle threads have barely any overhead, while creating new
> > threads can take some time.
> 
> at first i had 3000 threads set and varnish ocassionly droped
> connections, so I doubled it
> 
> so what whould be a recomended values ?
> will -w 1024,1024 -p thread_pools=6 whould be ok ?

6 thread pools is overkill. And the number of threads is multiplied with
the thread pools, so in this case you're essentially writing -w 6k,6k... 

I'd advice something like -w 200,1200 -p thread_pools=2 to begin with. Just
watch the overflows in varnishstat (it'll increase a good bit during
startup since it takes a little time to create the 400 threads). It should
stay fairly static after startup.

> the site is usually not so busy, but it has sometimes spikes of static
> traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
> to low

I seriously doubt 3k was too low. More likely, the min threads was hurting
you. Three thousand threads is quite a bit. Remember that these are actual
requests being handled, not keep-alive connections and the like.

> is it safe to change thread_pools on runtime ?

Safe; I'd assume so. But I don't know if it actually takes effect. I've yet
to see any good reason to change it from the default.

-- 
Kristian Lyngstøl
Redpill Linpro AS
Tlf: +47 21544179
Mob: +47 99014497


pgpGG7y6WlKUh.pgp
Description: PGP signature
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-14 Thread Lazy
2009/7/14 Kristian Lyngstol :
> On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote:
>> We are having hard time figuring out what's cosing varnish 503 error,
>> our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
>> everything is running on a single machine
>>
>> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f
>> /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w
>> 10,6000,300 -u nobody
>
> 6000 threads is too much. Since it's per pool, it'll cause up to 12 000
> threads to start. That's not likely to go over all that well. If you have
> that sort of traffic, you need to scale out. Also, 10 thread minimum is
> pretty low.
>
> I typically recommend setting the minimum thread count to what you expect
> your normal traffic to be at peak hours. It's probably a dedicated
> machines, and idle threads have barely any overhead, while creating new
> threads can take some time.

at first i had 3000 threads set and varnish ocassionly droped
connections, so I doubled it

so what whould be a recomended values ?
will -w 1024,1024 -p thread_pools=6 whould be ok ?

the site is usually not so busy, but it has sometimes spikes of static
traffic (about 50Mbps) that's why i upped the thread limit, 3000 was
to low

is it safe to change thread_pools on runtime ?

>
>> running with a single backend
>> .connect_timeout = 1s; added to the backend definition
>
> Any particular reason for adding that?

originally it wasn't there i added it trying to go around the issue

>
>> I added
>>
>> sub vcl_error {
>>     if (req.restarts < 10) {
>>         restart;
>>     }
>> }
>>
>> (is it possible to add a pause before doing restart ?)
>
> No. This is also a dirty workaround for a fundamental problem.
>
>> In about 0.1% of request we get
>>
>>    10 TxRequest    b POST
>>    10 TxURL        b /php
>>    10 TxProtocol   b HTTP/1.1
>>    10 TxHeader     b x-requested-with: XMLHttpRequest
>>    10 TxHeader     b Accept-Language: pl
>>    10 TxHeader     b Referer: http://www.x/php
>>    10 TxHeader     b Accept: text/html, */*
>>    10 TxHeader     b Content-Type: application/x-www-form-urlencoded
>>    10 TxHeader     b UA-CPU: x86
>>    10 TxHeader     b Accept-Encoding: gzip, deflate
>>    10 TxHeader     b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
>> Windows NT 5.1)
>>    10 TxHeader     b Content-Length: 8
>>    10 TxHeader     b Cookie: _.1
>>    10 TxHeader     b X-NovINet: v1.2
>>    10 TxHeader     b X-Varnish: 603437812
>>    10 TxHeader     b X-Forwarded-For: 79.162.xxx
>>    10 BackendClose b default
>>    31 VCL_call     c error
>>    31 VCL_return   c deliver
>>    31 Length       c 465
>>    31 VCL_call     c deliver
>>    31 VCL_return   c deliver
>>    31 TxProtocol   c HTTP/1.1
>>    31 TxStatus     c 503
>>
>> machine is not overloaded, there are 150 apache running 80% of them is idle
>>
>> what does
>> 31 VCL_call     c error mean , a connection error, apache returned
>> invalid response ?
>
> No, it just means that vcl_error is called. BackendClose notes that the
> connection to the backend was closed.
>
>> can I get some more information about this error using some syslog in
>> vcl_error or mayby in some other way ?
>
> Possibly, but using syslog in vcl is the last thing I'd recommend.
>
> Does your syslog say anything meaningful? Like assert-errors...
no, only info about admin commands

> (...)
>>        60064  Backend connections failures
>> this is old and it's not changing now
>
> Did the error-rate go down once you solved this? What was causing these
> problems?
it was related to load testing, in production it went away when i
upped maxclients on apache


>
>>           20  N worker threads
>>         4152  N worker threads created
>>            0  N worker threads not created
>>            0  N worker threads limited
>>            0  N queued work requests
>>       226847  N overflowed work requests
>
> This is what I mean with -w 10,6000 being wrong. After the initial startup,
> overflowed work requests shouldn't grow much, and you're currently running
> at only 20 threads (the minimum), which will cause overflows very fast
> (consider how many connections a single client will use to fetch a front
> page... You can easily imagine overflowing with just 3-4 concurrent
> clients.)
>
> But that's not really causing any 503s. Just delays while threads are
> created (and removed).

tcpdump of another 503 (apache is running on port 88),

11:09:50.187842 IP x.x.x.x.50780 > x.x.x.x.88: S 88526893:88526893(0)
win 32792 
11:09:50.187851 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 
11:09:50.187867 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257

11:09:53.187730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 
11:09:53.187740 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257

11:09:59.191730 IP x.x.x.x.88 > x.x.x.x.50780: S 81484078:81484078(0)
ack 88526894 win 32768 
11:09:59.191744 IP x.x.x.x.50780 > x.x.x.x.88: . ack 1 win 257

1

Re: varnish 2.0.4 backend errors

2009-07-14 Thread Kristian Lyngstol
On Sat, Jul 11, 2009 at 12:21:38AM +0200, Lazy wrote:
> We are having hard time figuring out what's cosing varnish 503 error,
> our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
> everything is running on a single machine
> 
> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f
> /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w
> 10,6000,300 -u nobody

6000 threads is too much. Since it's per pool, it'll cause up to 12 000
threads to start. That's not likely to go over all that well. If you have
that sort of traffic, you need to scale out. Also, 10 thread minimum is
pretty low.

I typically recommend setting the minimum thread count to what you expect
your normal traffic to be at peak hours. It's probably a dedicated
machines, and idle threads have barely any overhead, while creating new
threads can take some time.

> running with a single backend
> .connect_timeout = 1s; added to the backend definition

Any particular reason for adding that?

> I added
> 
> sub vcl_error {
> if (req.restarts < 10) {
> restart;
> }
> }
> 
> (is it possible to add a pause before doing restart ?)

No. This is also a dirty workaround for a fundamental problem.

> In about 0.1% of request we get
> 
>10 TxRequestb POST
>10 TxURLb /php
>10 TxProtocol   b HTTP/1.1
>10 TxHeader b x-requested-with: XMLHttpRequest
>10 TxHeader b Accept-Language: pl
>10 TxHeader b Referer: http://www.x/php
>10 TxHeader b Accept: text/html, */*
>10 TxHeader b Content-Type: application/x-www-form-urlencoded
>10 TxHeader b UA-CPU: x86
>10 TxHeader b Accept-Encoding: gzip, deflate
>10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
> Windows NT 5.1)
>10 TxHeader b Content-Length: 8
>10 TxHeader b Cookie: _.1
>10 TxHeader b X-NovINet: v1.2
>10 TxHeader b Host: www.kinograj.cinema-city.pl
>10 TxHeader b X-Varnish: 603437812
>10 TxHeader b X-Forwarded-For: 79.162.xxx
>10 BackendClose b default
>31 VCL_call c error
>31 VCL_return   c deliver
>31 Length   c 465
>31 VCL_call c deliver
>31 VCL_return   c deliver
>31 TxProtocol   c HTTP/1.1
>31 TxStatus c 503
> 
> machine is not overloaded, there are 150 apache running 80% of them is idle
> 
> what does
> 31 VCL_call c error mean , a connection error, apache returned
> invalid response ?

No, it just means that vcl_error is called. BackendClose notes that the
connection to the backend was closed.

> can I get some more information about this error using some syslog in
> vcl_error or mayby in some other way ?

Possibly, but using syslog in vcl is the last thing I'd recommend.

Does your syslog say anything meaningful? Like assert-errors...

(...)
>60064  Backend connections failures
> this is old and it's not changing now

Did the error-rate go down once you solved this? What was causing these
problems?
 
>   20  N worker threads
> 4152  N worker threads created
>0  N worker threads not created
>0  N worker threads limited
>0  N queued work requests
>   226847  N overflowed work requests

This is what I mean with -w 10,6000 being wrong. After the initial startup,
overflowed work requests shouldn't grow much, and you're currently running
at only 20 threads (the minimum), which will cause overflows very fast
(consider how many connections a single client will use to fetch a front
page... You can easily imagine overflowing with just 3-4 concurrent
clients.)

But that's not really causing any 503s. Just delays while threads are
created (and removed).

-- 
Kristian Lyngstøl
Redpill Linpro AS
Tlf: +47 21544179
Mob: +47 99014497


pgpGupl4jmDwn.pgp
Description: PGP signature
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-13 Thread Lazy
2009/7/13 Lazy :
> One more thing
> I set the processor affinity for all varnish threads so it will run on
> a single cpu, to check if it's not some race condition, so far I
> failed to trigger any errors by using ab, I will see tommorow when
> some real traffic will show up.

didn't make any diference
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-13 Thread Lazy
2009/7/14 Ken Brownfield :
> The progression from your dump is:
>
> Varnish         Apache
> SYN
>                SYN+ACK
> ACK
> ...4.4 seconds later...
>                SYN+ACK
> ACK
> PSH+ACK
>                ACK
> FIN+ACK (???)
>                PSH+ACK
> RST

right i missed the 4second gap, so varnish may be hitting a timeout

>
> It looks like the ACK from the Varnish side is getting lost on its way to
> Apache, and Apache retransmits.  This itself would imply that you have some
> packet loss between Varnish and Apache.  That 4.4 second delay could easily
> be running you against the sess_timeout (or cli_timeout, maybe).  Bumping
> those up to 10 might clear the issue for you.
>
> Obviously, packet loss will cause issues in general, so I'd investigate that
> anyway.
it's a local interface, not lo but the traffic is going threw loopback interface

>
> More bizarrely, that FIN+ACK is missing a FIN from Apache.  Are you sure
> this stream is correct?  I have no idea what would cause a spurious FIN+ACK
> (besides a spurious FIN) and the issue would be highly unlikely a Varnish
> issue; more likely a kernel TCP stack issue.
>
> Is there anything between Varnish and Apache in your config, besides two
> machines on a shared switch?  Proxies?  Firewall rules?  Switch ACLs?  What
> OSes/versions are you running on each side?

it's only 1 machine
___
varnish-misc mailing list
varnish-misc@projects.linpro.no
http://projects.linpro.no/mailman/listinfo/varnish-misc


Re: varnish 2.0.4 backend errors

2009-07-13 Thread Lazy
One more thing
I set the processor affinity for all varnish threads so it will run on
a single cpu, to check if it's not some race condition, so far I
failed to trigger any errors by using ab, I will see tommorow when
some real traffic will show up.

2009/7/13 Ken Brownfield :
> I would try correlating these 503's with actual Apache log lines.  That way,
> you'll see what Apache told Varnish (if it made it to Apache at all).
>
> Also, try tcpdumping the backend port to a file, and then use wireshark to
> go back to the traffic at the time you see another 503.
>
> Varnish should be logging failed health checks, which could explain periodic
> 503s.  If the health checks are succeeding, then Apache is likely
> periodically closing a connection or sending bad data.
>
> You could also try turning off HTTP 1.1 to Apache from Varnish
> (backend_http11=off) to see if that does anything.  With Varnish fronting
> Apache, doing 1.1 to the backend isn't necessarily a huge win.
>
> In the 503 case specifically, I'm not sure if restarts are respected.  Does
> this happen if you relax the connect_timeout, or if you remove the restart?
>  Does this happen only with POST?  Do you have any custom settings in
> vcl_pipe(), or do you bail out of vcl_recv() without falling through to the
> default VCL?
>
> Finally, did you edit the Cookie header?  It's malformed, which in theory
> could cause either Apache or Varnish to dump it.
> --
> Ken.
>
> On Jul 10, 2009, at 3:21 PM, Lazy wrote:
>
>> Hi,
>>
>> We are having hard time figuring out what's cosing varnish 503 error,
>> our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
>> everything is running on a single machine
>>
>> /usr/local/sbin/varnishd -a 0.0.0.0:80 -f
>> /usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w
>> 10,6000,300 -u nobody
>>
>> running with a single backend
>> .connect_timeout = 1s; added to the backend definition
>>
>> I added
>>
>> sub vcl_error {
>>   if (req.restarts < 10) {
>>       restart;
>>   }
>> }
>>
>> (is it possible to add a pause before doing restart ?)
>>
>> which helps in some cases but not always
>>
>>
>> In about 0.1% of request we get
>>
>>  10 TxRequest    b POST
>>  10 TxURL        b /php
>>  10 TxProtocol   b HTTP/1.1
>>  10 TxHeader     b x-requested-with: XMLHttpRequest
>>  10 TxHeader     b Accept-Language: pl
>>  10 TxHeader     b Referer: http://www.x/php
>>  10 TxHeader     b Accept: text/html, */*
>>  10 TxHeader     b Content-Type: application/x-www-form-urlencoded
>>  10 TxHeader     b UA-CPU: x86
>>  10 TxHeader     b Accept-Encoding: gzip, deflate
>>  10 TxHeader     b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
>> Windows NT 5.1)
>>  10 TxHeader     b Content-Length: 8
>>  10 TxHeader     b Cookie: _.1
>>  10 TxHeader     b X-NovINet: v1.2
>>  10 TxHeader     b Host: www.kinograj.cinema-city.pl
>>  10 TxHeader     b X-Varnish: 603437812
>>  10 TxHeader     b X-Forwarded-For: 79.162.xxx
>>  10 BackendClose b default
>>  31 VCL_call     c error
>>  31 VCL_return   c deliver
>>  31 Length       c 465
>>  31 VCL_call     c deliver
>>  31 VCL_return   c deliver
>>  31 TxProtocol   c HTTP/1.1
>>  31 TxStatus     c 503
>>
>> machine is not overloaded, there are 150 apache running 80% of them is
>> idle
>>
>> what does
>> 31 VCL_call     c error mean , a connection error, apache returned
>> invalid response ?
>>
>> can I get some more information about this error using some syslog in
>> vcl_error or mayby in some other way ?
>>
>> my param.show
>> accept_fd_holdoff          50 [ms]
>> acceptor                   default (epoll, poll)
>> auto_restart               on [bool]
>> backend_http11             on [bool]
>> between_bytes_timeout      60.00 [s]
>> cache_vbe_conns            off [bool]
>> cc_command                 "exec cc -fpic -shared -Wl,-x -o %o %s"
>> cli_buffer                 8192 [bytes]
>> cli_timeout                5 [seconds]
>> client_http11              off [bool]
>> clock_skew                 10 [s]
>> connect_timeout            1.00 [s]
>> default_grace              10
>> default_ttl                120 [seconds]
>> diag_bitmap                0x0 [bitmap]
>> err_ttl                    0 [seconds]
>> esi_syntax                 0 [bitmap]
>> fetch_chunksize            128 [kilobytes]
>> first_byte_timeout         60.00 [s]
>> group                      nogroup (65534)
>> listen_address             0.0.0.0:80
>> listen_depth               1024 [connections]
>> log_hashstring             off [bool]
>> log_local_address          off [bool]
>> lru_interval               2 [seconds]
>> max_esi_includes           5 [includes]
>> max_restarts               4 [restarts]
>> obj_workspace              8192 [bytes]
>> overflow_max               100 [%]
>> ping_interval              3 [seconds]
>> pipe_timeout               60 [seconds]
>> prefer_ipv6                off [bool]
>> purge_dups                 off [bool]
>> purge_hash                 on [bool]
>> rush_exponent  

Fwd: varnish 2.0.4 backend errors

2009-07-13 Thread Lazy
2009/7/13 Ken Brownfield :
> I would try correlating these 503's with actual Apache log lines.  That way,
> you'll see what Apache told Varnish (if it made it to Apache at all).
there are no sign of thiese requests in apache logs

>
> Also, try tcpdumping the backend port to a file, and then use wireshark to
> go back to the traffic at the time you see another 503.

it happens with or without keep-alive (it was set before and I
disabled it recently)

it's likelly the same issue as in "tcp reset problem with varnish
2.0.4 o n Solaris 10 (SPARC)" thread

it happens on 0,09% of request's, mostly POST'S but not only

now I rechecked the logs and I found that varnish closed the
connection just after sending the request
tcpdump follows, there are no backend connection errors
01:03:50.181412 IP x.x.x.x.48563 > x.x.x.x.88: S
3999574413:3999574413(0) win 32792 
       0x:  4500 003c 6586 4000 4006 3b56 566f f680  E.. x.x.x.x.48563: S
4006066350:4006066350(0) ack 3999574414 win 32768 
       0x:  4500 003c  4000 4006 a0dc 566f f680  E..<@.@...Vo..
       0x0010:  566f f680 0058 bdb3 eec7 b8ae ee64 a98e  Vo...X...d..
       0x0020:  a012 8000 2776  0204 400c 0402 080a  'v@.
       0x0030:  1b5a cc0b 1b5a cc0b 0103 0307            .Z...Z..
01:03:50.181433 IP x.x.x.x.48563 > x.x.x.x.88: . ack 1 win 257

       0x:  4500 0034 6587 4000 4006 3b5d 566f f680  e.@.@.;]Vo..
       0x0010:  566f f680 bdb3 0058 ee64 a98e eec7 b8af  Vo.X.d..
       0x0020:  8010 0101 0f9a  0101 080a 1b5a cc0b  .Z..
       0x0030:  1b5a cc0b                                .Z..
01:03:54.581224 IP x.x.x.x.88 > x.x.x.x.48563: S
4006066350:4006066350(0) ack 3999574414 win 32768 
       0x:  4500 003c  4000 4006 a0dc 566f f680  E..<@.@...Vo..
       0x0010:  566f f680 0058 bdb3 eec7 b8ae ee64 a98e  Vo...X...d..
       0x0020:  a012 8000 232a  0204 400c 0402 080a  #*@.
       0x0030:  1b5a d057 1b5a cc0b 0103 0307            .Z.W.Z..
01:03:54.581245 IP x.x.x.x.48563 > x.x.x.x.88: . ack 1 win 257

       0x:  4500 0040 6588 4000 4006 3b50 566f f680  ...@e.@@.;PVo..
       0x0010:  566f f680 bdb3 0058 ee64 a98e eec7 b8af  Vo.X.d..
       0x0020:  b010 0101 81fc  0101 080a 1b5a d057  .Z.W
       0x0030:  1b5a d057 0101 050a eec7 b8ae eec7 b8af  .Z.W
01:03:55.181239 IP x.x.x.x.48563 > x.x.x.x.88: P 1:896(895) ack 1 win
257 
       0x:  4500 03b3 6589 4000 4006 37dc 566f f680  e.@.@.7.Vo..
       0x0010:  566f f680 bdb3 0058 ee64 a98e eec7 b8af  Vo.X.d..
       0x0020:  8018 0101 9d85  0101 080a 1b5a d0ed  .Z..
       0x0030:  1b5a d057 504f 5354 202f 7175 697a 2f71  .Z.WPOST./quiz/q
       0x0040:  7565 7374 696f 6e20 4854 5450 2f31 2e31  uestion.HTTP/1.1
       0x0050:  0d0a 782d 7265 7175 6573 7465 642d 7769  ..x-requested-wi
       0x0060:  7468 3a20 584d 4c48 7474 7052 6571 7565  th:.XMLHttpReque
.. request data

01:03:55.181256 IP x.x.x.x.88 > x.x.x.x.48563: . ack 896 win 270

                       0x:  4500 0034 6842 4000 4006 38a2 566f
f680  e.@.@.8.Vo..
                       0x0010:  566f f680 0058 bdb3 eec7 b8af ee64
ad0d  Vo...X...d..
                       0x0020:  8010 010e 024a  0101 080a 1b5a
d0ed  .J...Z..
                       0x0030:  1b5a d0ed                                .Z..
... acked
01:03:55.181302 IP x.x.x.x.48563 > x.x.x.x.88: F 896:896(0) ack 1 win
257 
                       0x:  4500 0034 658a 4000 4006 3b5a 566f
f680  e.@.@.;ZVo..
                       0x0010:  566f f680 bdb3 0058 ee64 ad0d eec7
b8af  Vo.X.d..
                       0x0020:  8011 0101 0256  0101 080a 1b5a
d0ed  .V...Z..
                       0x0030:  1b5a d0ed                                .Z..
... connection closed ???
01:03:55.185068 IP x.x.x.x.88 > x.x.x.x.48563: P 1:2485(2484) ack 897
win 270 
                       0x:  4500 09e8 6843 4000 4006 2eed 566f
f680  e.@.@...Vo..
                       0x0010:  566f f680 0058 bdb3 eec7 b8af ee64
ad0e  Vo...X...d..
                       0x0020:  8018 010e a3ba  0101 080a 1b5a
d0ed  .Z..
                       0x0030:  1b5a d0ed 4854 5450 2f31 2e31 2032
3030  .Z..HTTP/1.1.200
                       0x0040:  204f 4b0d 0a44 6174 653a 2046 7269
2c20  .OK..Date:.Fri,.
                       0x0050:  3130 204a 756c 2032 3030 3920 3233
3a30  10.Jul.2009.23:0
                       0x0060:  333a 3535 2047 4d54 0d0a 5365 7276
6572  3:55.GMT..Server
                       0x0070:  3a20 4170 6163 6865 0d0a 4578 7069
7265  :.Apache..Expire
                       0x0080:  733a 2054 6875 2c20 3139 204e 6f76
2031  s:.Thu,.19.Nov.1
                       0x0090:  3938 3120 3038 3a35 323a 3030 2047
4d54  981.08:52:00.GMT
                       0x00a0:  0d0a 4361 6368 652d 436f 6e74 726f
6c3a  ..Cache-Control:
                  

varnish 2.0.4 backend errors

2009-07-13 Thread Lazy
Hi,

We are having hard time figuring out what's cosing varnish 503 error,
our backend is apache is debian 5 default, os is linux x86_64 2.6.26,
everything is running on a single machine

/usr/local/sbin/varnishd -a 0.0.0.0:80 -f
/usr/local/etc/varnish/default.vcl -s malloc -T localhost: -w
10,6000,300 -u nobody

running with a single backend
.connect_timeout = 1s; added to the backend definition

I added

sub vcl_error {
if (req.restarts < 10) {
restart;
}
}

(is it possible to add a pause before doing restart ?)

which helps in some cases but not always


In about 0.1% of request we get

   10 TxRequestb POST
   10 TxURLb /php
   10 TxProtocol   b HTTP/1.1
   10 TxHeader b x-requested-with: XMLHttpRequest
   10 TxHeader b Accept-Language: pl
   10 TxHeader b Referer: http://www.x/php
   10 TxHeader b Accept: text/html, */*
   10 TxHeader b Content-Type: application/x-www-form-urlencoded
   10 TxHeader b UA-CPU: x86
   10 TxHeader b Accept-Encoding: gzip, deflate
   10 TxHeader b User-Agent: Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 5.1)
   10 TxHeader b Content-Length: 8
   10 TxHeader b Cookie: _.1
   10 TxHeader b X-NovINet: v1.2
   10 TxHeader b Host: www.kinograj.cinema-city.pl
   10 TxHeader b X-Varnish: 603437812
   10 TxHeader b X-Forwarded-For: 79.162.xxx
   10 BackendClose b default
   31 VCL_call c error
   31 VCL_return   c deliver
   31 Length   c 465
   31 VCL_call c deliver
   31 VCL_return   c deliver
   31 TxProtocol   c HTTP/1.1
   31 TxStatus c 503

machine is not overloaded, there are 150 apache running 80% of them is idle

what does
31 VCL_call c error mean , a connection error, apache returned
invalid response ?

can I get some more information about this error using some syslog in
vcl_error or mayby in some other way ?

my param.show
accept_fd_holdoff  50 [ms]
acceptor   default (epoll, poll)
auto_restart   on [bool]
backend_http11 on [bool]
between_bytes_timeout  60.00 [s]
cache_vbe_connsoff [bool]
cc_command "exec cc -fpic -shared -Wl,-x -o %o %s"
cli_buffer 8192 [bytes]
cli_timeout5 [seconds]
client_http11  off [bool]
clock_skew 10 [s]
connect_timeout1.00 [s]
default_grace  10
default_ttl120 [seconds]
diag_bitmap0x0 [bitmap]
err_ttl0 [seconds]
esi_syntax 0 [bitmap]
fetch_chunksize128 [kilobytes]
first_byte_timeout 60.00 [s]
group  nogroup (65534)
listen_address 0.0.0.0:80
listen_depth   1024 [connections]
log_hashstring off [bool]
log_local_address  off [bool]
lru_interval   2 [seconds]
max_esi_includes   5 [includes]
max_restarts   4 [restarts]
obj_workspace  8192 [bytes]
overflow_max   100 [%]
ping_interval  3 [seconds]
pipe_timeout   60 [seconds]
prefer_ipv6off [bool]
purge_dups off [bool]
purge_hash on [bool]
rush_exponent  3 [requests per request]
send_timeout   600 [seconds]
sess_timeout   5 [seconds]
sess_workspace 16384 [bytes]
session_linger 0 [ms]
shm_reclen 255 [bytes]
shm_workspace  8192 [bytes]
srcaddr_hash   1049 [buckets]
srcaddr_ttl30 [seconds]
thread_pool_add_delay  20 [milliseconds]
thread_pool_add_threshold  2 [requests]
thread_pool_fail_delay 200 [milliseconds]
thread_pool_max6000 [threads]
thread_pool_min10 [threads]
thread_pool_purge_delay1000 [milliseconds]
thread_pool_timeout300 [seconds]
thread_pools   2 [pools]
user   nobody (65534)
vcl_trace  off [bool]

stats
 4571680  Client connections accepted
13074671  Client requests received
 9246516  Cache hits
8084  Cache hits for pass
  159743  Cache misses
 3768909  Backend connections success
   0  Backend connections not attempted
   0  Backend connections too many
   60064  Backend connections failures
this is old and it's not changing now

 2610440  Backend connections reuses
 3471493  Backend connections recycles
   0  Backend connections unused
1053  N struct srcaddr
   6  N active struct srcaddr
1017  N struct sess_mem
  60  N struct sess
 852  N struct object
4183  N struct objecthead
   0  N struct smf
   0  N small free smf
   0  N large free smf
  22  N struct vbe_conn
 494  N struct bereq
  20  N worker threads
4152  N worker threads created
   0  N worker thr