Hello,

I recently upgrade from trunk r4602 to trunk r4632 and now I'm starting to see a lot of panics in the syslog for varnish. I looked through the changelog and it appears in r4631, an assert() was added to bin/varnishd/cache_hash.c. Now the code seems to be hitting that assert and failing, thus crashing the child process. Varnish seems to keep running without problems, although the cache gets completely cleared out according to varnishstats. I'm not seeing a definitive pattern in when the assertion fails, but it's roughly every five minutes. The overall traffic to the site is not huge (probably only about 3000 page requests during that time right before it crashes).

Some excerpts from syslog are below:

Mar 19 16:51:19 wc2 tvan[13699]: Child (11787) died signal=6
Mar 19 16:51:19 wc2 tvan[13699]: Child (11787) Panic message: Assert error in HSH_DeleteObjHead(), cache_hash.c line 157:#012 Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread = (hcb_cleaner)#012ident = Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012 0x806e382: pan_ic+d2#012 0x80663b7: HSH_DeleteObjHead+97#012 0x807c907: hcb_cleaner+c7#012 0xb7ef34c0: _end+afe36c94#012 0xb7e336de: _end+afd76eb2#012
Mar 19 16:51:19 wc2 tvan[13699]: Child cleanup complete
Mar 19 16:51:19 wc2 tvan[13699]: child (12099) Started
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said Closed fds: 4 5 6 9 10 12 13
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said Child starts
Mar 19 16:51:19 wc2 tvan[13699]: Child (12099) said managed to mmap 1073741824 bytes of 1073741824
Mar 19 16:57:30 wc2 tvan[13699]: Child (12099) died signal=6
Mar 19 16:57:30 wc2 tvan[13699]: Child (12099) Panic message: Assert error in HSH_DeleteObjHead(), cache_hash.c line 157:#012 Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread = (hcb_cleaner)#012ident = Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012 0x806e382: pan_ic+d2#012 0x80663b7: HSH_DeleteObjHead+97#012 0x807c907: hcb_cleaner+c7#012 0xb7ef34c0: _end+afe36c94#012 0xb7e336de: _end+afd76eb2#012
Mar 19 16:57:30 wc2 tvan[13699]: Child cleanup complete
Mar 19 16:57:30 wc2 tvan[13699]: child (12372) Started
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said Closed fds: 4 5 6 9 10 12 13
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said Child starts
Mar 19 16:57:30 wc2 tvan[13699]: Child (12372) said managed to mmap 1073741824 bytes of 1073741824
Mar 19 17:03:43 wc2 tvan[13699]: Child (12372) died signal=6
Mar 19 17:03:43 wc2 tvan[13699]: Child (12372) Panic message: Assert error in HSH_DeleteObjHead(), cache_hash.c line 157:#012 Condition(VTAILQ_EMPTY(&oh->waitinglist)) not true.#012thread = (hcb_cleaner)#012ident = Linux,2.6.22.10-NMG,i686,-sfile,-hcritbit,epoll#012Backtrace:#012 0x806e382: pan_ic+d2#012 0x80663b7: HSH_DeleteObjHead+97#012 0x807c907: hcb_cleaner+c7#012 0xb7ef34c0: _end+afe36c94#012 0xb7e336de: _end+afd76eb2#012
Mar 19 17:03:43 wc2 tvan[13699]: Child cleanup complete
Mar 19 17:03:43 wc2 tvan[13699]: child (12750) Started
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said Closed fds: 4 5 6 9 10 12 13
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said Child starts
Mar 19 17:03:43 wc2 tvan[13699]: Child (12750) said managed to mmap 1073741824 bytes of 1073741824

I actually have two separate varnish instances running off of the same binaries so that I can easily control them independently (and use different IP addresses). Both of them are acting the same.

Any ideas on what I can do to troubleshoot this further?

Justin Pasher

_______________________________________________
varnish-misc mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-misc

Reply via email to