Re: [squid-dev] [PATCH] shared_memory_locking
On 03/10/2016 10:04 PM, Amos Jeffries wrote: > Okay fine with me now. > > +1. Please apply :=) Committed as trunk r14603 after fixing the lock() method description copy-paste error. Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] shared_memory_locking
Well I'm with you on this. I haven't reviewed this code yet and the words makes sense so.. debug() this or debug() that or about() this or that is related to crashes which I have not seen for a very long time. Eliezer On 11/03/2016 06:39, Alex Rousskov wrote: On 03/10/2016 08:32 PM, Eliezer Croitoru wrote: >Can this be verified in any way? Verify that I am not imagining things? Sure! If looking at fatal() itself is not enough to realize that it does not log the FATAL message until_after_ calling a few "heavy" functions (which may fail on their own), then just call "abort()" from releaseServerSockets() to emulate such a failure, and you will not see any FATAL messages from Squid, no mater how many times it calls fatal() or fatalf(). Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] shared_memory_locking
On 03/10/2016 08:32 PM, Eliezer Croitoru wrote: > Can this be verified in any way? Verify that I am not imagining things? Sure! If looking at fatal() itself is not enough to realize that it does not log the FATAL message until _after_ calling a few "heavy" functions (which may fail on their own), then just call "abort()" from releaseServerSockets() to emulate such a failure, and you will not see any FATAL messages from Squid, no mater how many times it calls fatal() or fatalf(). Alex. > On 11/03/2016 05:28, Amos Jeffries wrote: >> If there is a bug in fatalf() that's something else that needs to be >> fixed. But there has been no sign that I'm aware of about any such issue >> in for any of the hundreds of other calls to it. Please dont make bad >> code here just for that. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] shared_memory_locking
On 03/10/2016 08:28 PM, Amos Jeffries wrote: >>> * fatalf() sends a FATAL level error to cache.log. No need to preceed it >>> with a less important debugs ERROR message saying the same thing. >> There is such a need, unfortunately: The FATAL error is printed much >> later than the error is found, making triage a lot more difficult in >> some cases. In the extreme cases, the FATAL error is not printed at all >> (because Squid crashes long before fatal.cc code gets to printing the >> error). Fixing this fatal() problem is outside this patch scope. > I'm talking about these lines: > > ... { > +const int savedError = errno; > +debugs(54, DBG_IMPORTANT, "ERROR: shared_memory_locking enabled > but mlock(" << theName << ',' << theSize << ") failed: " << > xstrerr(savedError)); > +fatalf("shared_memory_locking on but failed to mlock(%s, %" > PRId64 "): %s\n", > + theName.termedBuf(), theSize, xstrerr(savedError)); > +} Yes, me too. >> To partially address your concern, I set debugs() level to 5 and removed >> the ERROR prefix. This still helps with triage when higher debugging >> levels are configured and makes the new code consistent with most other >> fatal() calls in Segment.cc. > If there is a bug in fatalf() that's something else that needs to be > fixed. Agreed, but, as I said, fixing fatalf() is out of scope. > But there has been no sign that I'm aware of about any such issue > in for any of the hundreds of other calls to it. Please dont make bad > code here just for that. It looks like I should not have wasted time explaining why this debugs() is needed and why it actually improves code consistency in Segment.cc, but I am not going to fight over a debugs() line. The patch without it is attached. Alex. Added shared_memory_locking configuration directive to control mlock(2). Locking shared memory at startup avoids SIGBUS crashes when kernel runs out of RAM during runtime. Why not enable it by default? Unfortunately, locking requires privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is likely to cause too many complaints, especially since Squid has not required that before. The default is off, at least for now. As we gain more experience, we may try to enable locking by default while making default locking failures non-fatal and warning about significant [accumulated] locking delays. === modified file 'src/SquidConfig.h' --- src/SquidConfig.h 2016-01-15 06:47:59 + +++ src/SquidConfig.h 2016-03-01 05:22:45 + @@ -55,40 +55,41 @@ public: int n_configured; /// number of disk processes required to support all cache_dirs int n_strands; }; #define INDEXSD(i) (Config.cacheSwap.swapDirs[i].getRaw()) } /// the representation of the configuration. POD. class SquidConfig { public: struct { /* These should be for the Store::Root instance. * this needs pluggable parsing to be done smoothly. */ int highWaterMark; int lowWaterMark; } Swap; YesNoNone memShared; ///< whether the memory cache is shared among workers +YesNoNone shmLocking; ///< shared_memory_locking size_t memMaxSize; struct { int64_t min; int pct; int64_t max; } quickAbort; int64_t readAheadGap; RemovalPolicySettings *replPolicy; RemovalPolicySettings *memPolicy; #if USE_HTTP_VIOLATIONS time_t negativeTtl; #endif time_t maxStale; time_t negativeDnsTtl; time_t positiveDnsTtl; time_t shutdownLifetime; time_t backgroundPingRate; struct { === modified file 'src/cf.data.pre' --- src/cf.data.pre 2016-02-19 17:19:25 + +++ src/cf.data.pre 2016-03-09 21:34:21 + @@ -402,40 +402,73 @@ LOC: Config.cpuAffinityMap DEFAULT: none DEFAULT_DOC: Let operating system decide. DOC_START Usage: cpu_affinity_map process_numbers=P1,P2,... cores=C1,C2,... Sets 1:1 mapping between Squid processes and CPU cores. For example, cpu_affinity_map process_numbers=1,2,3,4 cores=1,3,5,7 affects processes 1 through 4 only and places them on the first four even cores, starting with core #1. CPU cores are numbered starting from 1. Requires support for sched_getaffinity(2) and sched_setaffinity(2) system calls. Multiple cpu_affinity_map options are merged. See also: workers DOC_END +NAME: shared_memory_locking +TYPE: YesNoNone +COMMENT: on|off +LOC: Config.shmLocking +DEFAULT: off +DOC_START + Whether to ensure that all required shared memory is available by + "locking" that shared memory into RAM when Squid starts. The + alternative is faster startup time followed by slightly slower + performance and, if not enough RAM is actually available during + runtime, mysterious crashes. + + SMP Squid uses many shared memory segments. These segments are + brought into Squid memory space using an mmap(2) system call. During + Squid startup, the mmap()
Re: [squid-dev] [PATCH] shared_memory_locking
Just wondering, Can this be verified in any way? I have seen couple times in the past that things are not as expected to be but I do not have enough debug output reading experience. Eliezer On 11/03/2016 05:28, Amos Jeffries wrote: To partially address your concern, I set debugs() level to 5 and removed >the ERROR prefix. This still helps with triage when higher debugging >levels are configured and makes the new code consistent with most other >fatal() calls in Segment.cc. If there is a bug in fatalf() that's something else that needs to be fixed. But there has been no sign that I'm aware of about any such issue in for any of the hundreds of other calls to it. Please dont make bad code here just for that. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] shared_memory_locking
On 11/03/2016 4:01 a.m., Alex Rousskov wrote: > On 03/10/2016 01:33 AM, Amos Jeffries wrote: >> On 10/03/2016 11:14 a.m., Alex Rousskov wrote: >>> Hello, >>> >>> The attached patch adds a "shared_memory_locking" configuration >>> directive to control mlock(2). >>> >>> Locking shared memory at startup avoids SIGBUS crashes when kernel runs >>> out of RAM during runtime. This has been discussed during the "[RFC] Fix >>> shared memory initialization, cleanup. Ensure its usability." thread >>> archived at >>> http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html >>> >>> Why not enable locking by default? Unfortunately, locking requires >>> privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, >>> requiring locked memory by default is likely to cause too many >>> complaints, especially since Squid has not required that before. Even >>> "make check" will not work in most environments (without making our unit >>> tests even less representative)! Thus, the default is off, at least for now. >>> >>> As we gain more experience, we may enable locking by default while >>> making default locking failures non-fatal and warning about significant >>> [accumulated] locking delays. The proposed code was written to make >>> those possible future changes easier, but I am not volunteering to >>> implement them at this time. >>> >> >> In Ipc::Mem::Segment::lock(): >> >> * please use #if defined() instead of #ifdef. > > Done. > > >> * fatalf() sends a FATAL level error to cache.log. No need to preceed it >> with a less important debugs ERROR message saying the same thing. > > There is such a need, unfortunately: The FATAL error is printed much > later than the error is found, making triage a lot more difficult in > some cases. In the extreme cases, the FATAL error is not printed at all > (because Squid crashes long before fatal.cc code gets to printing the > error). Fixing this fatal() problem is outside this patch scope. I'm talking about these lines: ... { +const int savedError = errno; +debugs(54, DBG_IMPORTANT, "ERROR: shared_memory_locking enabled but mlock(" << theName << ',' << theSize << ") failed: " << xstrerr(savedError)); +fatalf("shared_memory_locking on but failed to mlock(%s, %" PRId64 "): %s\n", + theName.termedBuf(), theSize, xstrerr(savedError)); +} > > To partially address your concern, I set debugs() level to 5 and removed > the ERROR prefix. This still helps with triage when higher debugging > levels are configured and makes the new code consistent with most other > fatal() calls in Segment.cc. If there is a bug in fatalf() that's something else that needs to be fixed. But there has been no sign that I'm aware of about any such issue in for any of the hundreds of other calls to it. Please dont make bad code here just for that. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] [PATCH] shared_memory_locking
On 03/10/2016 01:33 AM, Amos Jeffries wrote: > On 10/03/2016 11:14 a.m., Alex Rousskov wrote: >> Hello, >> >> The attached patch adds a "shared_memory_locking" configuration >> directive to control mlock(2). >> >> Locking shared memory at startup avoids SIGBUS crashes when kernel runs >> out of RAM during runtime. This has been discussed during the "[RFC] Fix >> shared memory initialization, cleanup. Ensure its usability." thread >> archived at >> http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html >> >> Why not enable locking by default? Unfortunately, locking requires >> privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, >> requiring locked memory by default is likely to cause too many >> complaints, especially since Squid has not required that before. Even >> "make check" will not work in most environments (without making our unit >> tests even less representative)! Thus, the default is off, at least for now. >> >> As we gain more experience, we may enable locking by default while >> making default locking failures non-fatal and warning about significant >> [accumulated] locking delays. The proposed code was written to make >> those possible future changes easier, but I am not volunteering to >> implement them at this time. >> > > In Ipc::Mem::Segment::lock(): > > * please use #if defined() instead of #ifdef. Done. > * fatalf() sends a FATAL level error to cache.log. No need to preceed it > with a less important debugs ERROR message saying the same thing. There is such a need, unfortunately: The FATAL error is printed much later than the error is found, making triage a lot more difficult in some cases. In the extreme cases, the FATAL error is not printed at all (because Squid crashes long before fatal.cc code gets to printing the error). Fixing this fatal() problem is outside this patch scope. To partially address your concern, I set debugs() level to 5 and removed the ERROR prefix. This still helps with triage when higher debugging levels are configured and makes the new code consistent with most other fatal() calls in Segment.cc. Thank you, Alex. Added shared_memory_locking configuration directive to control mlock(2). Locking shared memory at startup avoids SIGBUS crashes when kernel runs out of RAM during runtime. Why not enable it by default? Unfortunately, locking requires privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is likely to cause too many complaints, especially since Squid has not required that before. The default is off, at least for now. As we gain more experience, we may try to enable locking by default while making default locking failures non-fatal and warning about significant [accumulated] locking delays. === modified file 'src/SquidConfig.h' --- src/SquidConfig.h 2016-01-15 06:47:59 + +++ src/SquidConfig.h 2016-03-01 05:22:45 + @@ -55,40 +55,41 @@ public: int n_configured; /// number of disk processes required to support all cache_dirs int n_strands; }; #define INDEXSD(i) (Config.cacheSwap.swapDirs[i].getRaw()) } /// the representation of the configuration. POD. class SquidConfig { public: struct { /* These should be for the Store::Root instance. * this needs pluggable parsing to be done smoothly. */ int highWaterMark; int lowWaterMark; } Swap; YesNoNone memShared; ///< whether the memory cache is shared among workers +YesNoNone shmLocking; ///< shared_memory_locking size_t memMaxSize; struct { int64_t min; int pct; int64_t max; } quickAbort; int64_t readAheadGap; RemovalPolicySettings *replPolicy; RemovalPolicySettings *memPolicy; #if USE_HTTP_VIOLATIONS time_t negativeTtl; #endif time_t maxStale; time_t negativeDnsTtl; time_t positiveDnsTtl; time_t shutdownLifetime; time_t backgroundPingRate; struct { === modified file 'src/cf.data.pre' --- src/cf.data.pre 2016-02-19 17:19:25 + +++ src/cf.data.pre 2016-03-09 21:34:21 + @@ -402,40 +402,73 @@ LOC: Config.cpuAffinityMap DEFAULT: none DEFAULT_DOC: Let operating system decide. DOC_START Usage: cpu_affinity_map process_numbers=P1,P2,... cores=C1,C2,... Sets 1:1 mapping between Squid processes and CPU cores. For example, cpu_affinity_map process_numbers=1,2,3,4 cores=1,3,5,7 affects processes 1 through 4 only and places them on the first four even cores, starting with core #1. CPU cores are numbered starting from 1. Requires support for sched_getaffinity(2) and sched_setaffinity(2) system calls. Multiple cpu_affinity_map options are merged. See also: workers DOC_END +NAME: shared_memory_locking +TYPE: YesNoNone +COMMENT: on|off +LOC: Config.shmLocking +DEFAULT: off +DOC_START + Whether to ensure that all required shared memory is available by + "locking" that shared memory
Re: [squid-dev] [PATCH] shared_memory_locking
On 10/03/2016 11:14 a.m., Alex Rousskov wrote: > Hello, > > The attached patch adds a "shared_memory_locking" configuration > directive to control mlock(2). > > Locking shared memory at startup avoids SIGBUS crashes when kernel runs > out of RAM during runtime. This has been discussed during the "[RFC] Fix > shared memory initialization, cleanup. Ensure its usability." thread > archived at > http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html > > Why not enable locking by default? Unfortunately, locking requires > privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, > requiring locked memory by default is likely to cause too many > complaints, especially since Squid has not required that before. Even > "make check" will not work in most environments (without making our unit > tests even less representative)! Thus, the default is off, at least for now. > > As we gain more experience, we may enable locking by default while > making default locking failures non-fatal and warning about significant > [accumulated] locking delays. The proposed code was written to make > those possible future changes easier, but I am not volunteering to > implement them at this time. > In Ipc::Mem::Segment::lock(): * please use #if defined() instead of #ifdef. * fatalf() sends a FATAL level error to cache.log. No need to preceed it with a less important debugs ERROR message saying the same thing. Amos ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
[squid-dev] [PATCH] shared_memory_locking
Hello, The attached patch adds a "shared_memory_locking" configuration directive to control mlock(2). Locking shared memory at startup avoids SIGBUS crashes when kernel runs out of RAM during runtime. This has been discussed during the "[RFC] Fix shared memory initialization, cleanup. Ensure its usability." thread archived at http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html Why not enable locking by default? Unfortunately, locking requires privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is likely to cause too many complaints, especially since Squid has not required that before. Even "make check" will not work in most environments (without making our unit tests even less representative)! Thus, the default is off, at least for now. As we gain more experience, we may enable locking by default while making default locking failures non-fatal and warning about significant [accumulated] locking delays. The proposed code was written to make those possible future changes easier, but I am not volunteering to implement them at this time. Thank you, Alex. Added shared_memory_locking configuration directive to control mlock(2). Locking shared memory at startup avoids SIGBUS crashes when kernel runs out of RAM during runtime. Why not enable it by default? Unfortunately, locking requires privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is likely to cause too many complaints, especially since Squid has not required that before. The default is off, at least for now. As we gain more experience, we may try to enable locking by default while making default locking failures non-fatal and warning about significant [accumulated] locking delays. === modified file 'src/SquidConfig.h' --- src/SquidConfig.h 2016-01-15 06:47:59 + +++ src/SquidConfig.h 2016-03-01 05:22:45 + @@ -55,40 +55,41 @@ public: int n_configured; /// number of disk processes required to support all cache_dirs int n_strands; }; #define INDEXSD(i) (Config.cacheSwap.swapDirs[i].getRaw()) } /// the representation of the configuration. POD. class SquidConfig { public: struct { /* These should be for the Store::Root instance. * this needs pluggable parsing to be done smoothly. */ int highWaterMark; int lowWaterMark; } Swap; YesNoNone memShared; ///< whether the memory cache is shared among workers +YesNoNone shmLocking; ///< shared_memory_locking size_t memMaxSize; struct { int64_t min; int pct; int64_t max; } quickAbort; int64_t readAheadGap; RemovalPolicySettings *replPolicy; RemovalPolicySettings *memPolicy; #if USE_HTTP_VIOLATIONS time_t negativeTtl; #endif time_t maxStale; time_t negativeDnsTtl; time_t positiveDnsTtl; time_t shutdownLifetime; time_t backgroundPingRate; struct { === modified file 'src/cf.data.pre' --- src/cf.data.pre 2016-02-19 17:19:25 + +++ src/cf.data.pre 2016-03-09 21:34:21 + @@ -402,40 +402,73 @@ LOC: Config.cpuAffinityMap DEFAULT: none DEFAULT_DOC: Let operating system decide. DOC_START Usage: cpu_affinity_map process_numbers=P1,P2,... cores=C1,C2,... Sets 1:1 mapping between Squid processes and CPU cores. For example, cpu_affinity_map process_numbers=1,2,3,4 cores=1,3,5,7 affects processes 1 through 4 only and places them on the first four even cores, starting with core #1. CPU cores are numbered starting from 1. Requires support for sched_getaffinity(2) and sched_setaffinity(2) system calls. Multiple cpu_affinity_map options are merged. See also: workers DOC_END +NAME: shared_memory_locking +TYPE: YesNoNone +COMMENT: on|off +LOC: Config.shmLocking +DEFAULT: off +DOC_START + Whether to ensure that all required shared memory is available by + "locking" that shared memory into RAM when Squid starts. The + alternative is faster startup time followed by slightly slower + performance and, if not enough RAM is actually available during + runtime, mysterious crashes. + + SMP Squid uses many shared memory segments. These segments are + brought into Squid memory space using an mmap(2) system call. During + Squid startup, the mmap() call often succeeds regardless of whether + the system has enough RAM. In general, Squid cannot tell whether the + kernel applies this "optimistic" memory allocation policy (but + popular modern kernels usually use it). + + Later, if Squid attempts to actually access the mapped memory + regions beyond what the kernel is willing to allocate, the + "optimistic" kernel simply kills Squid kid with a SIGBUS signal. + Some of the memory limits enforced by the kernel are currently + poorly understood: We do not know how to detect and check them. This + option ensures that the mapped memory will be available. + + This option may have