Ok, I have reproduced the bug on the unpatched trunk (revision 25c5f2ed3229e41e99eadff57374c3a93b41a356) without using custom vcl (the only section I have there is the backend specification).

Command to run varnish was:

/opt/varnish/sbin/varnishd \
    -a 0.0.0.0:6802 \
    -f /opt/varnish/etc/varnish/my.vcl \
    -P /var/run/varnishd.pid \
    -T 127.0.0.1:2000 \
    -d \
    -s file,/opt/varnish/var/varnish/storage.bin,1G

The system is running Debian with 32 bit kernel. As I mentioned earlier I was able to reproduce the problem on another machine with significantly different hardware configuration. The only common thing was that they were running debian with 32bit kernel. Also I used the same binaries on both machines. I could not reproduce the problem in 64 bit environment.

I'm attaching the stack trace and the log file. Please let me know if I can provide any more info.

On 09/03/2011 14:51, Geoff Simmons wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 03/ 9/11 03:17 PM, Dmitry Panov wrote:
Just a heads up, I'm getting assertion failures when running a rather
simple testcase: using local apache that serves /user/share/doc as the
backend and running wget -r http://localhost:6802/doc Shortly after that
the following errors start to appear:

Child (11125) Panic message: Assert error in http_Write(), cache_http.c
line 1181:
   Condition((hp->hd[HTTP_HDR_STATUS].b) != 0) not true.
thread = (cache-worker)
Thanks for the heads up. Can you send over the whole stack trace?

I have been able to reproduce it on 2 different machines with very
different hardware configurations which makes hardware problem quite
unlikely. Also

httperf --server localhost --port 6802 --uri /  --num-conns 1
--num-calls 4000

runs without a problem.

These 2 machines both run 32bit linux kernel. I haven't been able to
reproduce the problem in a 64bit environment.
Could be running out of workspace. I fixed a similar error during the
course of development, which had to do with the fact that sufficient
workspace has to be allocated for the both backend response *and* the
stale object; you might have found something related. Also, I've only
been testing with 64 bit; looks like I better test 32 bit as well.

Is there any way you can send the request&  response that are being
processed when the error happens?

And what if you set --num-conns high and --num-calls low, say 400
connections and 10 calls per connection? Or keep setting --num-conns
higher, to see if you can provoke the error? I've been running httperf
with 25,000 connections and 1000 calls per connection, found a memory
leak that way.

Unfortunately I haven't got time to try the unpatched trunk (I tried it
with revisions 3 and 4 of the patch) or do any further experiments but
I'll try to do so in the next couple of days and then post more details.
It's a good idea to test on the unpatched trunk as well, to make sure
that the bug really comes from the patch.

Thanks very much for the feedback!



Best regards,

--
Dmitry Panov

Attachment: logfile.txt.gz
Description: GNU Zip compressed data

Child (7071) died signal=6
Child (7071) Panic message: Assert error in http_Write(), cache_http.c line 
1181:
  Condition((hp->hd[HTTP_HDR_STATUS].b) != 0) not true.
thread = (cache-worker)
ident = Linux,2.6.26-2-686,i686,-sfile,-smalloc,-hcritbit,epoll
Backtrace:
  0x807fd2d: pan_backtrace+24
  0x807ffd6: pan_ic+193
  0x807ca25: http_Write+e6
  0x8084eab: RES_WriteObj+1cb
  0x805efd5: cnt_deliver+5ec
  0x80633c3: CNT_Session+6ae
  0x80824de: wrk_do_cnt_sess+160
  0x8081c6c: wrk_thread_real+d36
  0x80820d9: wrk_thread+109
  0xb76f2955: _end+af61e065
sp = 0x6e137004 {
  fd = 11, id = 11, xid = 721834422,
  client = 127.0.0.1 42805,
  step = STP_DELIVER,
  handling = deliver,
  err_code = 200, err_reason = (null),
  restarts = 0, esi_level = 0
  ws = 0x6e137054 { 
    id = "sess",
    {s,f,r,e} = {0x6e137800,+220,(nil),+16384},
  },
  http[req] = {
    ws = 0x6e137054[sess]
      "GET",
      "/doc/dvd+rw-tools/",
      "HTTP/1.0",
      "Referer: http://localhost:6802/doc/";,
      "User-Agent: Wget/1.11.4",
      "Accept: */*",
      "Host: localhost:6802",
      "Connection: Keep-Alive",
      "X-Forwarded-For: 127.0.0.1",
  },
  worker = 0xb75130e0 {
    ws = 0xb7513218 { overflow
      id = "wrk",
      {s,f,r,e} = {0xb750cfb0,+16384,(nil),+16384},
    },
    http[resp] = {
      ws = 0xb7513218[wrk]
        "HTTP/1.1",
        "OK",
        "Server: Apache/2.2.9 (Debian) proxy_html/3.0.1",
        "Last-Modified: Mon, 23 Jun 2008 14:32:23 GMT",
        "ETag: "2222-fc35-450564fcbabc0"",
        "Vary: Accept-Encoding",
        "Content-Type: text/html",
        "Via: 1.1 varnish",
    },
    },
    vcl = {
      srcname = {
        "input",
        "Default",
      },
    },
  obj = 0x8fbb4000 {
    xid = 721834422,
    ws = 0x8fbb4010 { 
      id = "obj",
      {s,f,r,e} = {0x8fbb415c,+300,(nil),+316},
    },
    http[obj] = {
      ws = 0x8fbb4010[obj]
        "HTTP/1.1",
        "OK",
        "Date: Wed, 09 Mar 2011 18:47:20 GMT",
        "Server: Apache/2.2.9 (Debian) proxy_html/3.0.1",
        "Last-Modified: Mon, 23 Jun 2008 14:32:23 GMT",
        "ETag: "2222-fc35-450564fcbabc0"",
        "Vary: Accept-Encoding",
        "Content-Encoding: gzip",
        "Content-Type: text/html",
        "Content-Length: 23730",
    },
    len = 23730,
    store = {
      23730 {
        1f 8b 08 00 00 00 00 00 00 03 cd 7d fb 77 db c8 |...........}.w..|
        91 ee cf ee bf 02 a3 dc 1d 4b 09 1f 92 2c 7b 3c |.........K...,{<|
        33 b6 e6 ea 65 5b 3b b6 e5 48 f2 38 b3 3a 3a 3e |3...e[;..H.8.::>|
        20 09 8a 18 83 00 03 80 92 99 b3 27 7f fb ad af | ..........'....|
        [23666 more]
      },
    },
  },
},

_______________________________________________
varnish-dev mailing list
[email protected]
http://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev

Reply via email to