On 3/10/2016 1:03 p.m., Alex Rousskov wrote: > On 10/02/2016 03:25 PM, Kinkie wrote: >> On Fri, Sep 30, 2016 at 6:03 PM, Alex Rousskov >> <[email protected]> wrote: >>> On 09/29/2016 09:19 PM, Amos Jeffries wrote: >>>> On 30/09/2016 5:03 a.m., Alex Rousskov wrote: >>>>> Should we remove the increment to make concurrent c_str() calls safe? >>> >>>> The reason it exists remember is to prevent other SBuf sharing that >>>> storage MemBuf from thinking they can append to the storage oer top of >>>> the terminator. >>> >>> Yes, I know, but that increment not only prevents problems but _causes_ >>> them as well: The increment may force the SBuf to realloc, which may >>> result in deallocation of the original storage still pointed to by the >>> earlier c_str() result. >>> >>> >>>> Given that we have no easy knowledge of how the c_str() >>>> is going to be used by the recipient code we cannot guarantee lack of >>>> terminator is safe. >>> >>> Yes, with or without the increment, Squid may crash or misbehave when >>> buggy code appends to SBuf while a previous c_str() result is still in >>> use. The lack of increment is not safe. The presence of the increment is >>> not safe. The problems are different, but they exist with or without the >>> increment. >>> >>> >>>> Time to push SBuf usage forward and aggressively remove the need for >>>> c_str()? >>> >>> The need for c_str() will never go away because libraries outside of >>> Squid control need c-strings. And, due to the lack of proper QA, any >>> "aggressive push" affecting lots of Squid code is a recipe for a >>> disaster IMO. >>> >>> The question is which c_str() implementation is safer? The one that may >>> crash Squid when called twice for the same SBuf in the same expression >>> or the one that may crash Squid when appending to the same SBuf in the >>> same expression? There are already two known cases of the former. There >>> are currently no known cases of the latter. >>> >>> >>> Overall, I know of three primary ways to implement c_str(): >>> >>> 1. Always 0-terminate SBuf-used storage. Safest implementation possible, >>> but has serious negative performance overheads, especially for tokenizing. >> >> If this is the decision, might as well drop sbuf altogether in favor >> of std::string (or possibly std::string + custom mempools-friendly >> Allocator). > > You are probably right, although std::string may have other funny things > going that we might want to avoid/control.
Yes. SBuf is pointless without the 0-copy property which is its primary use-case. If every use of c_str() is going to cow() and write a terminator, then the performance takes a huge hit until such time as we have rolled SBuf out widely. If you want to improve performance, we have to make a call on dropping the 0-copy goal or pushing forward with it ASAP. The argument that library APIs etc need c_str() is bogus. We can implement a different safer c-string conversion or for the nicer libraries (like most of libc or STL) pass them unterminated (char*, size) tuplets. > >>> 2. Terminate the used storage at the time of c_str(). >>> >>> 2a. Increment the used storage size. Current implementation. >>> Leads to problems discussed in this thread. >>> >>> 2b. Do not increment the used storage size. >>> Leads to other problems discussed in this thread. >>> >>> 3. Allocate and store a new c-string as needed, copying from the >>> storage. Safest implementation possible but has serious negative >>> performance overheads. These overheads affect different use cases than >>> those of #1. > >> If we are aware of the underlying issue, there's no reason not to save >> the temporary to a naked char* and use that, as long as the SBuf is >> not touched it'll work fine. > > Save to a temporary variable outside SBuf? That is not a solution > because we are trying to address the cases where the developer forgot to > do that or could not really tell that doing that is necessary. Anything > from trivial/direct > > foo(buf.c_str(), buf.c_str()); > > to complex/hidden > > foo(buf, buf.c_str()) which calls bar(buf1.c_str(), cstr2); > Thats not what comes to mind for me with kinkies description. I imagined a cached const char* pointer inside SBuf that c_str() produces if it is unset. The cow() operations freeing that cached pointer. Anyhow, see new option (vi) proposal at the end. > >>> Needless to say, we can add clever code to reduce risks associated with >>> some implementations. For example (these are just rough sketches/ideas): >>> >>> a) Appending code in #2b can check whether it is about to overwrite the >>> 0-terminator added by a previous c_str() call and, if yes, preserve the >>> old storage (we can preserve one such storage or maintain a global FIFO >>> queue). >>> >>> b) C-string allocating code in #4 can maintain a global FIFO queue of >>> previously allocated but now supposedly unused c-strings to minimize the >>> risk of de-allocating a still used c-string. >>> >>> c) C-string allocating code in #4 can delay allocation until it might be >>> needed, similar to how (a) decides that the 0-terminator is in danger. >>> >>> >>> Going forward, do you think we should: >>> >>> i. Keep using #2a and rewrite the known double-c_str() code. That is two options. I think we should fix the double-c_str() code regardless of any other changes. These are definitely performance bugs with the 2nd call triggeringd reallocate regardless of other bad side effects. >>> ii. Make double-c_str() code safe (i.e., switch to #2b). >>> iii. Add code to make the existing #2 implementation safer. >>> iiii. Switch to a completely different implementation like #1 or #4. > >> v. push SBuf as suggested by Amos and document this specific corner >> case so to avoid it > > I do not think "push SBuf" is a comparable alternative. "Pushing SBuf" > is not something we can safely do in a reasonable time. It is and > remains a long-term goal, but we need to address the problem "now". The > offered alternatives can be implemented in a matter of hours or weeks. > "Pushing SBuf" safely will probably take a year and will not eliminate > all c_str() calls anyway so the same problems/alternatives will remain > even after the "push" is over. There are very few things we can do will resolve this in such a short timeframe. That is a natural result of the code being so large and complex. Any choice we make to alter the design will take a long time to check every single code path works okay. Another option is: vi. drop c_str() completely and use SBufToCstring() or SBufToString() as ways to obtain char*. Adjusting callers as necessary for those not to leak memory or be too bad for performance will be easier than a full code audit. Just looking at it there are 56 callers documented by doxygen in the layer-02-maximus build for 4.x. I'd estimate perhapse a dozen that are not. Amos _______________________________________________ squid-dev mailing list [email protected] http://lists.squid-cache.org/listinfo/squid-dev
