I did a bzr update just now, to commit, and I see that this problem is fixed :-) Please ignore....
On 04/08/2014 06:48 PM, Tsantilas Christos wrote: > The problem is that the peer_select code trying to access a std::vector > which is already destroyed. > If we call peerSelect for fwd FwdState object eg: > peerSelect(&fwd->serverDestinations, ..., fwd); > > and the fwd state object become invalid (eg because client closed the > connection) then it is possible to see this crash. > > I believe the fix for these cases is easy, we need to add the following > code at the beggining of peerSelectDnsResults and peerSelectDnsPaths > methods (and any other method uses the vector): > > if (!cbdataReferenceValid(psstate->callback_data)) { > delete psstate; > return; > } > > Regards, > Christos > > On 03/26/2014 11:13 AM, Kinkie wrote: >> And instead.. >> the code testing Alex's suspects triggered in peer_select.cc:337, this >> is the relevant snippet of stack trace: >> >> >> #3 0x0000000000632f2d in Vector<RefCount<Comm::Connection> >::size ( >> this=<optimized out>, this=<optimized out>) at ../src/base/Vector.h:355 >> #4 0x0000000000638a45 in peerSelectDnsResults (ia=0x29ff8b0, details=..., >> data=0x2cf59b0) at peer_select.cc:337 >> #5 0x000000000060d4e6 in ipcacheCallback (i=i@entry=0x29ff890, >> wait=wait@entry=19) at ipcache.cc:347 >> #6 0x000000000060dbd4 in ipcacheHandleReply (data=<optimized out>, >> answers=<optimized out>, na=<optimized out>, error_message=0x0) >> at ipcache.cc:497 >> #7 0x000000000058b357 in idnsCallback (q=0x24eec80, q@entry=0x24f5370, >> error=error@entry=0x0) at dns_internal.cc:1127 >> >> >> I'm currently investigating more in depth. >> >> On Wed, Mar 26, 2014 at 8:32 AM, Kinkie <gkin...@gmail.com> wrote: >>> Hi, >>> I think I have isolated the changelet in FwdState::connectionList. >>> However WHAT in that changelet may be causing the crashes is still >>> unknown. The hunt for iterate-after-clear has so far turned up >>> nothing. >>> >>> On Wed, Mar 26, 2014 at 8:01 AM, Niki Gorchilov <n...@gorchilov.com> wrote: >>>> Hello there, >>>> >>>> Is there any progress regarding the random coredumps related to >>>> std::vector migration? >>>> >>>> Best, >>>> Niki >>>> >>>> On Mon, Mar 17, 2014 at 6:41 PM, Kinkie <gkin...@gmail.com> wrote: >>>>> Yes, I will. >>>>> >>>>> On Mon, Mar 17, 2014 at 5:24 PM, Alex Rousskov >>>>> <rouss...@measurement-factory.com> wrote: >>>>>> Hello, >>>>>> >>>>>> I have discovered one difference between std::vector and Squid's own >>>>>> Vector implementation that might explain some of the random crashes that >>>>>> some of us have been seeing after the Vector class was replaced with >>>>>> std::vector in trunk. >>>>>> >>>>>> Squid Vector sets the number of stored elements to zero on destruction. >>>>>> Std::vector does not do that. Here is the output of a simple test code >>>>>> quoted in the postscriptum: >>>>>> >>>>>>> alive std::vector size: 1 >>>>>>> deleted std::vector size: 3 >>>>>>> alive Squid Vector size: 1 >>>>>>> deleted Squid Vector size: 0 >>>>>> >>>>>> Both vectors behave correctly, but std::vector code is optimized not to >>>>>> do extra work such as maintaining the size member. This means that >>>>>> iterating a deleted Squid Vector is often safe (until the freed memory >>>>>> is overwritten) because the broken caller code would just discover that >>>>>> there are no items in the vector and move on. >>>>>> >>>>>> Iterating the deleted std::vector usually leads to crashes because all >>>>>> iteration-related methods such as size() rely on immediately deleted and >>>>>> changed pointers (std::vector does not store its size as a data member >>>>>> but uses memory pointers to compute the number of stored elements). >>>>>> >>>>>> This difference leads to std::vector migration problems such as this >>>>>> trivially reproducible one: >>>>>> >>>>>>> Adaptation::AccessRules & >>>>>>> Adaptation::AllRules() >>>>>>> { >>>>>>> static AccessRules TheRules; >>>>>>> return TheRules; >>>>>>> } >>>>>> >>>>>> After TheRules array is deallocated during global destructions, >>>>>> iterating AllRules() becomes unsafe. The old Vector-based code usually >>>>>> worked because deallocated TheRules had zero size. >>>>>> >>>>>> The specific bug mentioned above is trivial to work around by allocating >>>>>> TheRules dynamically (and never deleting it). However, if similar bugs >>>>>> exist in other code where Vector is used as a data member of some >>>>>> transaction-related structure, then they will lead to crashes. It is >>>>>> quite possible that the affected structure's memory itself is never >>>>>> deleted when the offending code accesses it (due to cbdata locks, for >>>>>> example) so the [equally buggy] Vector-based code "works". >>>>>> >>>>>> One way we can try to catch such cases is by temporary switching back to >>>>>> Vector and then: >>>>>> >>>>>> * Modifying Vector to mark deleted Vector objects: >>>>>> >>>>>> Vector::~Vector() { >>>>>> clean(); >>>>>> deleted = true; >>>>>> } >>>>>> >>>>>> * And adjusting *every* Vector method to assert if a deleted object is >>>>>> still being used. For example: >>>>>> >>>>>> template<class E> >>>>>> size_t >>>>>> Vector<E>::size() const >>>>>> { >>>>>> assert(!deleted); >>>>>> return count; >>>>>> } >>>>>> >>>>>> If one of those asserts is triggered, we would be closer to solving this >>>>>> mystery. >>>>>> >>>>>> >>>>>> Kinkie, if you agree with this analysis, would you be able to make the >>>>>> above modifications and test? >>>>>> >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Alex. >>>>>> >>>>>> >>>>>> Goes into Squid's main.cc: >>>>>>> +#include <vector> >>>>>>> +#include "base/Vector.h" >>>>>>> int >>>>>>> main(int argc, char **argv) >>>>>>> { >>>>>>> + typedef std::vector<int> StdVector; >>>>>>> + StdVector *stdv = new StdVector; >>>>>>> + stdv->push_back(1); >>>>>>> + std::cout << "alive std::vector size: " << stdv->size() << >>>>>>> "\n"; >>>>>>> + delete stdv; >>>>>>> + std::cout << "deleted std::vector size: " << stdv->size() << >>>>>>> "\n"; >>>>>>> + >>>>>>> + typedef Vector<int> SqdVector; >>>>>>> + SqdVector *sqdv = new SqdVector; >>>>>>> + sqdv->push_back(1); >>>>>>> + std::cout << "alive Squid Vector size: " << sqdv->size() << >>>>>>> "\n"; >>>>>>> + delete sqdv; >>>>>>> + std::cout << "deleted Squid Vector size: " << sqdv->size() << >>>>>>> "\n"; >>>>>>> + if (true) return 0; >>>>>>> return SquidMainSafe(argc, argv); >>>>>>> } >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Francesco >>> >>> >>> >>> -- >>> Francesco >> >> >> > > -- Tsantilas Christos Network and Systems Engineer email:chris...@chtsanti.net web:http://www.chtsanti.net Phone:+30 6977678842