On 09/17/2011 10:42 AM, Alex Rousskov wrote: > On 09/17/2011 02:01 AM, Amos Jeffries wrote: >> On 16/09/11 16:29, Alex Rousskov wrote: >>> On 09/15/2011 08:24 PM, Amos Jeffries wrote: >>>> On 16/09/11 06:16, Alex Rousskov wrote: >>>>> ------------------------------------------------------------ >>>>> revno: 11740 >>>>> committer: Alex Rousskov<[email protected]> >>>>> branch nick: trunk >>>>> timestamp: Thu 2011-09-15 12:16:59 -0600 >>>>> message: >>>>> Do not let cache manager requests kill SMP Squid using isOpen() >>>>> assertion. >>>>> >>>>> As the comment above the close call implies, we have not imported >>>>> the foreign socket descriptor into our fd_table yet. We must use >>>>> raw close(2), just like the corresponding >>>>> Mgr::Request::Request(msg) code that allocates request.conn, uses >>>>> raw assignment to give that half-baked connection a descriptor. >>>>> >>>>> TODO: This direct manipulation of Connection::fd is ugly, and this >>>>> half-baked connection will most likely cause more [hidden] problems >>>>> down the road. For example, Mgr::Request destructor will assert in >>>>> a similar way if the request object is destroyed before >>>>> Action::respond() is called.>> modified: >>>>> src/mgr/Action.cc >>> >>> >>>> Problem is: why is the FD getting outside the IPC Coordinator/Strand >>>> scope when it is not "valid" according to fd_table? >>> >>> >>> It is not 100% clear where that IPC scope ends in this case, but the >>> problem is not really with the scope borders. The bug was due to the >>> [old] action creator saying "we are not going to import this raw fd so >>> that each specific action can decide whether it actually needs an fd" >>> and the [new] default action code saying "we are going to close this >>> Comm connection" even though the Connection object was in that >>> half-baked state. >>> >>> >>>> - If it can't be read/write/close, it should not really be passed >>>> round >>>> at all. >>>> - If it *can* be read/write/close, the ->close() is the right way to >>>> close and remove the possible handlers. >>> >>> It can be read/written/closed, but it can be read/written/closed using >>> Comm only after the fd is imported into Comm space. The old code did not >>> import and used raw close(2) by default. The new code still did not >>> import but used Comm close by default. >>> >>> >>>> Correlation of these is that the code doing conn->fd = msg.getFd() when >>>> unpacking the TypedMsgHdr should be (a) not, or (b) setting up fd_table >>>> state sufficiently to match the Comm::Connection state. >>> >> >> >> Option (1) >> >>> Yes, that is why I left a TODO note: We can change the old code to >>> import the fd into Comm [and create a full Connection object] as soon as >>> possible. It would be a waste of cycles in most cases, but it will >>> prevent half-baked Connection objects from lingering around and causing >>> more asserts or fd leaks. >>> >> >> option (2) >>> The alternative is to use a "raw fd holder" class/object that would use >>> raw close(2) unless a fully-baked Connection object is created. SSL code >>> uses that approach for many temporary SSL things to avoid leaks. >> >> option (3): >> Comm::Connection::close() could detect the fd_table flag which is >> asserting right? then it could use ::close() instead of comm_close(). > > I do not like option (3) because it will prevent us from detecting bugs > where a regular fd is closed twice (from two Connection objects) and > other code logic problems. > > >> option (4): >> Or probably, add another Comm::Connection flag to indicate "partial >> import" for the opening TCP flags. Which triggers the particular closing >> method if still set on ->close(). > > >>> Since this is not a performance-sensitive area of the code, the former >>> (i.e., create a Connection object ASAP) may be better/simpler. >> >> Well, like you keep reminding me. Everything is performance sensitive >> indirectly. > > I think that would be an exaggeration. It is difficult to imagine a > realistic scenario where we need to super-optimize cache manager > queries, but if such a need does surface, we can always migrate from > simpler to more complex (option #1 to another option). > > >> So I'm thinking option (4), with option (1) in specific cases would be >> best. >> Option (3) whenever we fiddle with the fd directly. Option (1) >> whenever its known that the FD will be used. Possibly some time after >> option (3) was done, so unsetting the half-imported flag if set earlier. >> This way we avoid fd_table and related CPU consumptions entirely for >> most of these. And still roughly follow the semantics of what the flags >> mean: things done when opening this FD, so we can repeat the sequence in >> comm_reset() as needed. > > I think we should keep it simple and use Option #1 for all cases. The > fewer half-baked things we have, the better (this is not limited to > asserts when closing a descriptor). If you insist, I can live with > option #4. > > > BTW, I think Ipc::ImportFdIntoComm() is now broken as well. If things go > wrong, it calls conn->close() for a half-baked connection. If you work > on this, please try to find all cases where Connection code changed > close(2) to conn->close(). There may be more such bugs lurking there.
Found another similar assertion which exposed a related but different problem. Assertion: When Coordinator gets an IPC cache manager request for an action that Coordinator cannot find, it asserts because there is no Action::respond() call to close the half-baked connection. Both options #1 and #4 would solve this. Problem: Coordinator must know of all actions known to workers because Coordinator cannot aggregate cache manager responses without an action-specific Action object. Currently, there are actions Coordinator does not know about but workers do (e.g., mgr:sourcehash). This is probably due to some modules not being initialized in Coordinator because it does not need them for any other purpose. I can try to take care of the above Problem. Will you work on implementing option #1 (or #4) to get rid of assertions? Thank you, Alex. P.S. We also do not handle Coordinator asserting/quitting very well, but that is a very different problem outside this thread scope.
