Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
Agreed. Thanks On Dec 11, 2015 17:41, "Alex Rousskov"wrote: > On 12/10/2015 02:23 PM, Kinkie wrote: > > On Thu, Dec 10, 2015 at 10:07 PM, Alex Rousskov wrote: > >> On 12/10/2015 11:25 AM, Francesco Chemolli wrote: > >>> And it is actually pretty detectable: let’s choose an arbitrary size > >>> of the shared segment and output a warning message to cache.log. This > >>> won’t help with the speed, but it might help with the confusion.. > > > >> 2015/12/09 11:24:51| Total mlock() delay exceeded 5.3 seconds. See > >> shared_memory_locking in squid.conf.documented. > > > > I was thinking more of something _before_ the delay: > > e.g. if shared memory is > 100mb, then > > "locking XXX mbytes of shared memory. On some systems this may require > > a long time. Squid is not dead, it's just thinking" > > > I think the post-event warning is better: > > * It is difficult to guess which segment sizes will cause significant > delays in a given environment. > > * Squid allocates many segments and a few large segments. Warning the > admin before each large allocation would create a lot of noise. Not > warning the admin before some large allocations would kind of defeat the > idea of the preemptive warning itself. > > * There is a possibility that the significant (from admin point of view) > startup delay is caused by the _cumulative_ effect of locking various > shared memory segments and not by any single segment. > > * Warning folks about something that has not happened yet and may never > happen increases log noise. In this particular case, we _can_ warn when > we know that startup was significantly delayed because of locking, > reducing that noise. > > * There is a good chance that somebody concerned about startup delays > will notice the first line printed _after_ the delay. > > > Are you sure that preemptive per-segment warnings are better than an > acknowledgement of the problem after it happens? Do you think the latter > is likely to be missed by admins investigating startup delays? > > > > and then when it's done "shared memory locking of X Mb done in Y > > seconds. This may be normal depending on the size of the shared memory > > area". > > I would rather put explanations in squid.conf.documented so that we do > not have to risk misleading folks when discussing complex issues using a > one-line statement. Besides, we need to point folks to options > controlling this behavior (in case they do not think that slow startup > is "normal" for them). > > > Thank you, > > Alex. > > ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 12/10/2015 02:23 PM, Kinkie wrote: > On Thu, Dec 10, 2015 at 10:07 PM, Alex Rousskov wrote: >> On 12/10/2015 11:25 AM, Francesco Chemolli wrote: >>> And it is actually pretty detectable: let’s choose an arbitrary size >>> of the shared segment and output a warning message to cache.log. This >>> won’t help with the speed, but it might help with the confusion.. >> 2015/12/09 11:24:51| Total mlock() delay exceeded 5.3 seconds. See >> shared_memory_locking in squid.conf.documented. > I was thinking more of something _before_ the delay: > e.g. if shared memory is > 100mb, then > "locking XXX mbytes of shared memory. On some systems this may require > a long time. Squid is not dead, it's just thinking" I think the post-event warning is better: * It is difficult to guess which segment sizes will cause significant delays in a given environment. * Squid allocates many segments and a few large segments. Warning the admin before each large allocation would create a lot of noise. Not warning the admin before some large allocations would kind of defeat the idea of the preemptive warning itself. * There is a possibility that the significant (from admin point of view) startup delay is caused by the _cumulative_ effect of locking various shared memory segments and not by any single segment. * Warning folks about something that has not happened yet and may never happen increases log noise. In this particular case, we _can_ warn when we know that startup was significantly delayed because of locking, reducing that noise. * There is a good chance that somebody concerned about startup delays will notice the first line printed _after_ the delay. Are you sure that preemptive per-segment warnings are better than an acknowledgement of the problem after it happens? Do you think the latter is likely to be missed by admins investigating startup delays? > and then when it's done "shared memory locking of X Mb done in Y > seconds. This may be normal depending on the size of the shared memory > area". I would rather put explanations in squid.conf.documented so that we do not have to risk misleading folks when discussing complex issues using a one-line statement. Besides, we need to point folks to options controlling this behavior (in case they do not think that slow startup is "normal" for them). Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 11/12/2015 5:06 a.m., Alex Rousskov wrote: > On 12/10/2015 01:53 AM, Amos Jeffries wrote: >> On 10/12/2015 1:24 p.m., Alex Rousskov wrote: >>> On 12/09/2015 04:24 PM, Amos Jeffries wrote: On 10/12/2015 10:49 a.m., Alex Rousskov wrote: >>> Questions: Should we add a configuration directive to control whether >>> mlock(2) is called? If yes, should mlock(2) be called by default? >>> Should mlock(2) failures be fatal? > >> My answers are no, yes, yes. Since this is a startup thing squid.conf >> post-processing is probably too late. > > >> It makes them "maybe", no, yes. > > >> Though I am still not convinced exactly that squid.conf is the right >> place. As I said before: >> " >> In general if a setting is not something that can have a reasonable >> temporary default for a while, then be switched around in realtime. Then >> squid.conf is not a good place for it. > > Yes, you said that, but (a) your precondition still does not quite match > the situation at hand: > > * The "perform mlock" setting _has_ a reasonable default (i.e., yes, lock), > > * that default can be overwritten to "do not lock" if the admin does not > want to lock, and > > * the setting can even be switched around during reconfiguration (after > somebody adds support for reconfigurable shared storage, although the > change from no-lock to lock can be supported earlier); > > and (b) I disagree with the overall decision logic you are proposing: > IMO, command line options should be limited to these two areas: > > i. controlling Squid behavior before squid.conf is parsed. > ii. controlling another Squid instance (for legacy reasons). > > Memory locking controls do not belong to the command line. And yes, many > current command line options violate the above, but that does not make > those violations good ideas and does not mean we should add more of them. > > >> I would make it mandatory on "-k check" (but not -k parse). Optional >> through a command line parameter on all other runs. > > > I have two problems with the -k check approach: > > 1. -k check is documented to parse configuration and send a signal to > the already running Squid. To implement what you are proposing would > require also initializing Squid storage. Initializing Squid storage when > another Squid instance is running is either impossible (because it will > conflict with the existing instance) or would require a highly > customized code (to work around those conflicts) that will not fully > test what a freshly started Squid would experience. Finally, locking > memory on -k check may also swap out or even crash the running Squid > instance! > > 2. Since most folks do not run -k check, they will not be aware that > their Squid or OS is misconfigured and will suffer from the obscure > effects of that misconfiguration such as SIGBUS and segmentation fault > deaths. > > Problem #2 affects the "Optional through a command line parameter on all > other runs" part as well. > > >> The other (default / normal) runs having it OFF by default, since the -k >> check should have confirmed already that it is going to work. > > Yes, but I do not think we should use -kcheck and can rely on -kcheck as > discussed in #1 and #2 above. If we want to avoid these SIGBUS problems > in old and new configurations, we should lock by default. I see no other > way. > > We may, of course, decide that SIGBUS problems are not significant > enough to be worth avoiding, but it feels wrong to ship a proxy that can > easily crash in its default configuration just because we decided to > avoid a check preventing such crashes. I would rather ship a proxy that > starts slow when explicitly configured to use lots of shared memory. > Okay. Good points, and some very good ones. So it comes down to doing it your way and we will have to come up with something to explain to confused admin why their proxies are now slow. If we really have no choice in the matter (and you make a good case for that being so). Then requiring an answer up front on what we will say is not useful, just blocking progress. It is hard to pick any answer or explanation in advance of knowing how bad the problem is (if at all), and I dont think we should spend time trying to. So is there anything else you need to discuss? or shall we wind up this thread and wait to see what appears for audit? Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 10/12/2015 1:24 p.m., Alex Rousskov wrote: > On 12/09/2015 04:24 PM, Amos Jeffries wrote: >> On 10/12/2015 10:49 a.m., Alex Rousskov wrote: > Questions: Should we add a configuration directive to control whether > mlock(2) is called? If yes, should mlock(2) be called by default? > Should mlock(2) failures be fatal? > My answers are no, yes, yes. Since this is a startup thing squid.conf post-processing is probably too late. > > >>> Shared memory is not allocated until _after_ we parse squid.conf. > > >> Ah. Okay. > > > Does this understanding change your no/yes/yes answers? > It makes them "maybe", no, yes. Though I am still not convinced exactly that squid.conf is the right place. As I said before: " In general if a setting is not something that can have a reasonable temporary default for a while, then be switched around in realtime. Then squid.conf is not a good place for it. I would make it mandatory on "-k check" (but not -k parse). Optional through a command line parameter on all other runs. (Remember we have long-args possible now so no need to squeeze it into a gap.) " Which is *not* doing mlock by default (retaining current behaviour). The slowdown then only happens on -k check, which is the manual test to check for problems. If mlock has a problem, a FATAL debugs (maybe exit too, but could keep going to find other errors) during the test run is very appropriate. The other (default / normal) runs having it OFF by default, since the -k check should have confirmed already that it is going to work. Unless the admin explicitly sets the config / command line option to ON. In which case complaints of slowdown are unjustified, and they can be told to disable again. > My plan was to tell them to disable mlock() (via a new squid.conf > directive) if all of the items below apply to them: > > (a) mlock() is successful now > (b) they are reasonably sure no environment or configuration changes > will make mlock() unsuccessful (if it is re-enabled) without them > knowing about the problem > (c) mlock() slows things down too much in their setup > (d) they want to trade faster startup for micro delays during runtime > Which seems to me all the things a -k check execution run will be testing for naturally. So if mlock errors are fatal on -k check, they get alerted to fix it before the running process is signalled to restart with the new config. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 12/10/2015 11:25 AM, Francesco Chemolli wrote: > > Well, slow in starting up, not in operating, hopefully. Yes, I expect runtime performance to negligibly(?) improve during warmup (on correctly configured deployments) due to pre-allocation of all shared memory used during runtime. > And it is actually pretty detectable: let’s choose an arbitrary size > of the shared segment and output a warning message to cache.log. This > won’t help with the speed, but it might help with the confusion.. Excellent idea! And we can improve it by warning if the combined mlock(2) _delay_ exceeds some hard-coded value. I wonder what that warning should say to avoid a [usually wrong] implication that memory locking should be turned off. How about something like this: """ 2015/12/09 11:24:51| Total mlock() delay exceeded 5.3 seconds. See shared_memory_locking in squid.conf.documented. """ We can set level-1 reporting threshold at 5 or even 10 seconds. We do not need to add a shared_memory_locking parameter to disable this reporting (while still locking), do we? Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On Thu, Dec 10, 2015 at 10:07 PM, Alex Rousskovwrote: > On 12/10/2015 11:25 AM, Francesco Chemolli wrote: >> >> Well, slow in starting up, not in operating, hopefully. > > Yes, I expect runtime performance to negligibly(?) improve during warmup > (on correctly configured deployments) due to pre-allocation of all > shared memory used during runtime. > > >> And it is actually pretty detectable: let’s choose an arbitrary size >> of the shared segment and output a warning message to cache.log. This >> won’t help with the speed, but it might help with the confusion.. > > Excellent idea! And we can improve it by warning if the combined > mlock(2) _delay_ exceeds some hard-coded value. > > I wonder what that warning should say to avoid a [usually wrong] > implication that memory locking should be turned off. How about > something like this: > > """ > 2015/12/09 11:24:51| Total mlock() delay exceeded 5.3 seconds. See > shared_memory_locking in squid.conf.documented. > """ > > We can set level-1 reporting threshold at 5 or even 10 seconds. We do > not need to add a shared_memory_locking parameter to disable this > reporting (while still locking), do we? I was thinking more of something _before_ the delay: e.g. if shared memory is > 100mb, then "locking XXX mbytes of shared memory. On some systems this may require a long time. Squid is not dead, it's just thinking" and then when it's done "shared memory locking of X Mb done in Y seconds. This may be normal depending on the size of the shared memory area". -- Francesco ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 10 Dec 2015, at 19:23, Amos Jeffrieswrote: > Okay. Good points, and some very good ones. > > > So it comes down to doing it your way and we will have to come up with > something to explain to confused admin why their proxies are now slow. Well, slow in starting up, not in operating, hopefully. And it is actually pretty detectable: let’s choose an arbitrary size of the shared segment and output a warning message to cache.log. This won’t help with the speed, but it might help with the confusion.. Kinkie ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 12/10/2015 01:53 AM, Amos Jeffries wrote: > On 10/12/2015 1:24 p.m., Alex Rousskov wrote: >> On 12/09/2015 04:24 PM, Amos Jeffries wrote: >>> On 10/12/2015 10:49 a.m., Alex Rousskov wrote: >> Questions: Should we add a configuration directive to control whether >> mlock(2) is called? If yes, should mlock(2) be called by default? >> Should mlock(2) failures be fatal? > My answers are no, yes, yes. Since this is a startup thing squid.conf > post-processing is probably too late. > It makes them "maybe", no, yes. > Though I am still not convinced exactly that squid.conf is the right > place. As I said before: > " > In general if a setting is not something that can have a reasonable > temporary default for a while, then be switched around in realtime. Then > squid.conf is not a good place for it. Yes, you said that, but (a) your precondition still does not quite match the situation at hand: * The "perform mlock" setting _has_ a reasonable default (i.e., yes, lock), * that default can be overwritten to "do not lock" if the admin does not want to lock, and * the setting can even be switched around during reconfiguration (after somebody adds support for reconfigurable shared storage, although the change from no-lock to lock can be supported earlier); and (b) I disagree with the overall decision logic you are proposing: IMO, command line options should be limited to these two areas: i. controlling Squid behavior before squid.conf is parsed. ii. controlling another Squid instance (for legacy reasons). Memory locking controls do not belong to the command line. And yes, many current command line options violate the above, but that does not make those violations good ideas and does not mean we should add more of them. > I would make it mandatory on "-k check" (but not -k parse). Optional > through a command line parameter on all other runs. I have two problems with the -k check approach: 1. -k check is documented to parse configuration and send a signal to the already running Squid. To implement what you are proposing would require also initializing Squid storage. Initializing Squid storage when another Squid instance is running is either impossible (because it will conflict with the existing instance) or would require a highly customized code (to work around those conflicts) that will not fully test what a freshly started Squid would experience. Finally, locking memory on -k check may also swap out or even crash the running Squid instance! 2. Since most folks do not run -k check, they will not be aware that their Squid or OS is misconfigured and will suffer from the obscure effects of that misconfiguration such as SIGBUS and segmentation fault deaths. Problem #2 affects the "Optional through a command line parameter on all other runs" part as well. > The other (default / normal) runs having it OFF by default, since the -k > check should have confirmed already that it is going to work. Yes, but I do not think we should use -kcheck and can rely on -kcheck as discussed in #1 and #2 above. If we want to avoid these SIGBUS problems in old and new configurations, we should lock by default. I see no other way. We may, of course, decide that SIGBUS problems are not significant enough to be worth avoiding, but it feels wrong to ship a proxy that can easily crash in its default configuration just because we decided to avoid a check preventing such crashes. I would rather ship a proxy that starts slow when explicitly configured to use lots of shared memory. Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 12/09/2015 02:19 PM, Amos Jeffries wrote: >> Questions: Should we add a configuration directive to control whether >> mlock(2) is called? If yes, should mlock(2) be called by default? >> Should mlock(2) failures be fatal? > My answers are no, yes, yes. Since this is a startup thing squid.conf > post-processing is probably too late. Not sure I understand. Shared memory is not allocated until _after_ we parse squid.conf. The allocation code needs to know how much to allocate and that depends on squid.conf. mlock() is a part of the shared memory allocation code... FWIW, in my description about startup delays, "startup" is everything that happens before Squid is ready to handle requests. Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 10/12/2015 7:42 a.m., Alex Rousskov wrote: > Hello, > > The attached patch does four things: > > 1. Fixes shared memory initialization. The OS X portability fix (Bug > 3805) is a gift that keeps on giving: After we fixed OS X compilation, > we stopped initializing previously used shared memory after crashes! > Search the patch for "truncate". > > 2. Fixes shared memory cleanup. One segment was not deleted when running > Squid in non-daemon mode (-N). Search the patch for "UsingSmp". > > 3. Makes sure using shared memory does not lead to SIGBUS crashes. > Search the patch for "mlock". > > 4. Adds tests to check whether shared memory is usable. Search the patch > for "memory checks" and "shouldTest". > > > My plan is: > > * Commit #1 and #2 fixes to trunk without review unless somebody > requests one now: IMO, they are simple and non-controversial. > +1 for #1 bits. ... without the new "HERE" macros. > * Remove/forget #4. These paranoid checks should not be needed after #3, > which they have helped to test. We can always add them later if I am wrong. > Maybe make these optional. Or #if 0 them out for now. At the very least the checks are broken on 32-bit machinery: * Segment theSize is a off_t (64-bit unsigned). - fillTest() at least is storing its value into an 'int' (32-bit signed) for the page count math. - maybe causeing 32-bit systems with large-file support to have wrap issues with 2GB+ signed/unsigned values. - the math needs to be done directly in off_t or with explicit uint64_t. - where conversion to int is required (for memset/memcpy?) it also needs to do limit checking that the 64-bit value is (n < 2^31) before assignment. If you choose to drop the *Test() code entirely this is not an issue. If you #if 0 the above should probably be fixed or mentioned as a XXX fix. > * Discuss mlock() changes in #3 (below). Post the corresponding change > for the proper review. The code adding the mlock(2) call itself is > simple, but the side effects of this change are serious enough to > warrant proper review IMO. > > > Questions: Should we add a configuration directive to control whether > mlock(2) is called? If yes, should mlock(2) be called by default? > Should mlock(2) failures be fatal? > > > My answers are three YESes: I propose to call mlock(2) by default (if > available) to guarantee that mmapped memory is usable. I also propose to > make mlock(2) failures fatal. I hesitate offering a configuration option > to control this behavior, but I think we should offer it because mlock() > causes startup delays. I will discuss the problem and explain my > rationale below. > AFAICT, this large delay is the only valid argument for making mlock() > calls optional -- if I am sure that Squid has enough RAM, then I might > want to trade one large startup delay for numerous tiny delays as the > mmapped memory gets allocated and used. > > Needless to say, if we make mlock() calls optional, then some admins > will disable them without giving Squid enough RAM and will then complain > about mysterious crashes. On the other hand, all other factors being > equal, we should provide tools for knowledgeable admins even if those > tools may be misused by others. > > How would you answer the three questions above? My answers are no, yes, yes. Since this is a startup thing squid.conf post-processing is probably too late. In general if a setting is not something that can have a reasonable temporary default for a while, then be switched around in realtime. Then squid.conf is not a good place for it. I would make it mandatory on "-k check" (but not -k parse). Optional through a command line parameter on all other runs. (Remember we have long-args possible now so no need to squeeze it into a gap.) Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 10/12/2015 10:49 a.m., Alex Rousskov wrote: > On 12/09/2015 02:19 PM, Amos Jeffries wrote: > >>> Questions: Should we add a configuration directive to control whether >>> mlock(2) is called? If yes, should mlock(2) be called by default? >>> Should mlock(2) failures be fatal? > >> My answers are no, yes, yes. Since this is a startup thing squid.conf >> post-processing is probably too late. > > Not sure I understand. Shared memory is not allocated until _after_ we > parse squid.conf. The allocation code needs to know how much to allocate > and that depends on squid.conf. mlock() is a part of the shared memory > allocation code... Ah. Okay. I was thinking it was allocated early somewhere in the pre-config setups; Mem::Init, storeFsInit(), Fs::Init(), StoreFileSystem::SetupAllFs(), Store::Init(). > > FWIW, in my description about startup delays, "startup" is everything > that happens before Squid is ready to handle requests. Nod. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [RFC] Fix shared memory initialization, cleanup. Ensure its usability.
On 12/09/2015 04:24 PM, Amos Jeffries wrote: > On 10/12/2015 10:49 a.m., Alex Rousskov wrote: Questions: Should we add a configuration directive to control whether mlock(2) is called? If yes, should mlock(2) be called by default? Should mlock(2) failures be fatal? >>> My answers are no, yes, yes. Since this is a startup thing squid.conf >>> post-processing is probably too late. >> Shared memory is not allocated until _after_ we parse squid.conf. > Ah. Okay. Does this understanding change your no/yes/yes answers? If your answers stay the same, what do you propose to tell admins that will complain about significantly slower startup times after this fix? My plan was to tell them to disable mlock() (via a new squid.conf directive) if all of the items below apply to them: (a) mlock() is successful now (b) they are reasonably sure no environment or configuration changes will make mlock() unsuccessful (if it is re-enabled) without them knowing about the problem (c) mlock() slows things down too much in their setup (d) they want to trade faster startup for micro delays during runtime Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev