This is probably what's killing my customer up north: > % varnishd -a localhost:8080 -b localhost:80 -sfile,/tmp,4m -d -d
[note that there is no server at localhost:80] > file /tmp/varnish.TIGksM (unlinked) size 4194304 bytes (1024 fs-blocks, 1024 > pages) > Using old SHMFILE > rolling(1)... > rolling(2)... > start > CLI <start> > start child pid 7023 > 200 0 > > Child said (2, 7023): <<Child starts > managed to mmap 4194304 bytes of 4194304 > Ready > CLI ready > >> [now point firefox at http://localhost:8080/] > Child said (2, 7023): <<socktest: linger=0 sndtimeo=0 rcvtimeo=0 > >> > Child not responding to ping [because it is busy dumping core] > Cache child died pid=7023 status=0x8b [0x8b = SIGSEGV] > Clean child > Child cleaned > start child pid 7039 > Child said (2, 7039): <<Child starts > managed to mmap 4194304 bytes of 4194304 > Ready > CLI ready > >> > Manager got SIGINT > Clean child > Child stopping > unlink /tmp/vcl.XX9MHO5n [let's look at that core file] > % gdb =varnishd ~/core > GNU gdb 6.6-debian > Copyright (C) 2006 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu"... > Using host libthread_db library "/lib/libthread_db.so.1". > Error while mapping shared library sections: > /tmp/vcl.XX9MHO5n: No such file or directory. > Reading symbols from /opt/varnish/lib/libvarnish.so.0...done. > Loaded symbols for /opt/varnish/lib/libvarnish.so.0 > Reading symbols from /opt/varnish/lib/libvcl.so.0...done. > Loaded symbols for /opt/varnish/lib/libvcl.so.0 > Reading symbols from /lib/libdl.so.2...done. > Loaded symbols for /lib/libdl.so.2 > Reading symbols from /lib/librt.so.1...done. > Loaded symbols for /lib/librt.so.1 > Reading symbols from /lib/libpthread.so.0...done. > Loaded symbols for /lib/libpthread.so.0 > Reading symbols from /lib/libc.so.6...done. > Loaded symbols for /lib/libc.so.6 > Reading symbols from /lib/ld-linux-x86-64.so.2...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Reading symbols from /lib/libnss_files.so.2...done. > Loaded symbols for /lib/libnss_files.so.2 > Symbol file not found for /tmp/vcl.XX9MHO5n > Core was generated by `varnishd -a localhost:8080 -b localhost:80 > -sfile,/tmp,4m -d -d'. > Program terminated with signal 11, Segmentation fault. > #0 0x00002afef513d3d9 in pthread_mutex_lock () from /lib/libpthread.so.0 > (gdb) where > #0 0x00002afef513d3d9 in pthread_mutex_lock () from /lib/libpthread.so.0 > #1 0x000000000040cfa3 in HSH_Unbusy (o=0x64e840) at cache_hash.c:195 > #2 0x00000000004095fc in cnt_fetch (sp=0x64c308) at cache_center.c:292 > #3 0x000000000040a428 in CNT_Session (sp=0x64c308) at steps.h:40 > #4 0x0000000000411c2b in wrk_do_one (w=0x43002c50) at cache_pool.c:191 > #5 0x0000000000411fe7 in wrk_thread (priv=0x632340) at cache_pool.c:231 > #6 0x00002afef513b2a5 in start_thread () from /lib/libpthread.so.0 > #7 0x00002afef541f61d in clone () from /lib/libc.so.6 > #8 0x0000000000000000 in ?? () > (gdb) up > #1 0x000000000040cfa3 in HSH_Unbusy (o=0x64e840) at cache_hash.c:195 > 195 LOCK(&o->objhead->mtx); > (gdb) l > 190 CHECK_OBJ_NOTNULL(o, OBJECT_MAGIC); > 191 assert(o->busy); > 192 assert(o->refcnt > 0); > 193 if (o->cacheable) > 194 EXP_Insert(o); > 195 LOCK(&o->objhead->mtx); > 196 o->busy = 0; > 197 UNLOCK(&o->objhead->mtx); > 198 while (1) { > 199 sp = TAILQ_FIRST(&o->waitinglist); > (gdb) p *o > $1 = {magic = 847584578, refcnt = 1, xid = 780407261, objhead = 0x0, > heap_idx = 0, ban_seq = 0, pass = 0, response = 0, valid = 0, cacheable = 0, > busy = 1, len = 0, age = 0, entered = 0, ttl = 0, last_modified = 0, http = > { > magic = 1680389577, s = 0x0, t = 0x0, v = 0x0, f = 0x0, e = 0x0, > conds = 0 '\0', logtag = HTTP_Rx, hd = {{b = 0x0, > e = 0x0} <repeats 32 times>}, hdf = '\0' <repeats 31 times>, nhd = 0}, > list = {tqe_next = 0x0, tqe_prev = 0x0}, deathrow = {tqe_next = 0x0, > tqe_prev = 0x0}, store = {tqh_first = 0x0, tqh_last = 0x64eb18}, > waitinglist = {tqh_first = 0x0, tqh_last = 0x64eb28}} There is no o->objhead because no object was ever retrieved and Fetch() returned early. Why is o->busy protected by o->objhead->mtx? Why not have a mutex directly in struct object instead? DES -- Dag-Erling Smørgrav Senior Software Developer Linpro AS - www.linpro.no _______________________________________________ varnish-dev mailing list [email protected] http://projects.linpro.no/mailman/listinfo/varnish-dev
