Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-24 Thread Alex Rousskov
On 03/10/2016 10:04 PM, Amos Jeffries wrote:
> Okay fine with me now.
> 
> +1. Please apply :=)

Committed as trunk r14603 after fixing the lock() method description
copy-paste error.

Alex.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Eliezer Croitoru

Well I'm with you on this.
I haven't reviewed this code yet and the words makes sense so..
debug() this or debug() that or about() this or that is related to 
crashes which I have not seen for a very long time.


Eliezer

On 11/03/2016 06:39, Alex Rousskov wrote:

On 03/10/2016 08:32 PM, Eliezer Croitoru wrote:


>Can this be verified in any way?

Verify that I am not imagining things? Sure! If looking at fatal()
itself is not enough to realize that it does not log the FATAL message
until_after_  calling a few "heavy" functions (which may fail on their
own), then just call "abort()" from releaseServerSockets() to emulate
such a failure, and you will not see any FATAL messages from Squid, no
mater how many times it calls fatal() or fatalf().

Alex.


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Alex Rousskov
On 03/10/2016 08:32 PM, Eliezer Croitoru wrote:

> Can this be verified in any way?

Verify that I am not imagining things? Sure! If looking at fatal()
itself is not enough to realize that it does not log the FATAL message
until _after_ calling a few "heavy" functions (which may fail on their
own), then just call "abort()" from releaseServerSockets() to emulate
such a failure, and you will not see any FATAL messages from Squid, no
mater how many times it calls fatal() or fatalf().

Alex.



> On 11/03/2016 05:28, Amos Jeffries wrote:
>> If there is a bug in fatalf() that's something else that needs to be
>> fixed. But there has been no sign that I'm aware of about any such issue
>> in for any of the hundreds of other calls to it. Please dont make bad
>> code here just for that.

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Alex Rousskov
On 03/10/2016 08:28 PM, Amos Jeffries wrote:

>>> * fatalf() sends a FATAL level error to cache.log. No need to preceed it
>>> with a less important debugs ERROR message saying the same thing.

>> There is such a need, unfortunately: The FATAL error is printed much
>> later than the error is found, making triage a lot more difficult in
>> some cases. In the extreme cases, the FATAL error is not printed at all
>> (because Squid crashes long before fatal.cc code gets to printing the
>> error). Fixing this fatal() problem is outside this patch scope.


> I'm talking about these lines:
> 
>  ... {
> +const int savedError = errno;
> +debugs(54, DBG_IMPORTANT, "ERROR: shared_memory_locking enabled
> but mlock(" << theName << ',' << theSize << ") failed: " <<
> xstrerr(savedError));
> +fatalf("shared_memory_locking on but failed to mlock(%s, %"
> PRId64 "): %s\n",
> +   theName.termedBuf(), theSize, xstrerr(savedError));
> +}

Yes, me too.


>> To partially address your concern, I set debugs() level to 5 and removed
>> the ERROR prefix. This still helps with triage when higher debugging
>> levels are configured and makes the new code consistent with most other
>> fatal() calls in Segment.cc.

> If there is a bug in fatalf() that's something else that needs to be
> fixed. 

Agreed, but, as I said, fixing fatalf() is out of scope.


> But there has been no sign that I'm aware of about any such issue
> in for any of the hundreds of other calls to it. Please dont make bad
> code here just for that.

It looks like I should not have wasted time explaining why this debugs()
is needed and why it actually improves code consistency in Segment.cc,
but I am not going to fight over a debugs() line. The patch without it
is attached.

Alex.

Added shared_memory_locking configuration directive to control mlock(2).

Locking shared memory at startup avoids SIGBUS crashes when kernel runs
out of RAM during runtime. Why not enable it by default? Unfortunately,
locking requires privileges and/or much-higher-than-default
RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is
likely to cause too many complaints, especially since Squid has not
required that before. The default is off, at least for now.

As we gain more experience, we may try to enable locking by default
while making default locking failures non-fatal and warning about
significant [accumulated] locking delays.

=== modified file 'src/SquidConfig.h'
--- src/SquidConfig.h	2016-01-15 06:47:59 +
+++ src/SquidConfig.h	2016-03-01 05:22:45 +
@@ -55,40 +55,41 @@ public:
 int n_configured;
 /// number of disk processes required to support all cache_dirs
 int n_strands;
 };
 #define INDEXSD(i) (Config.cacheSwap.swapDirs[i].getRaw())
 }
 
 /// the representation of the configuration. POD.
 class SquidConfig
 {
 public:
 struct {
 /* These should be for the Store::Root instance.
 * this needs pluggable parsing to be done smoothly.
 */
 int highWaterMark;
 int lowWaterMark;
 } Swap;
 
 YesNoNone memShared; ///< whether the memory cache is shared among workers
+YesNoNone shmLocking; ///< shared_memory_locking
 size_t memMaxSize;
 
 struct {
 int64_t min;
 int pct;
 int64_t max;
 } quickAbort;
 int64_t readAheadGap;
 RemovalPolicySettings *replPolicy;
 RemovalPolicySettings *memPolicy;
 #if USE_HTTP_VIOLATIONS
 time_t negativeTtl;
 #endif
 time_t maxStale;
 time_t negativeDnsTtl;
 time_t positiveDnsTtl;
 time_t shutdownLifetime;
 time_t backgroundPingRate;
 
 struct {

=== modified file 'src/cf.data.pre'
--- src/cf.data.pre	2016-02-19 17:19:25 +
+++ src/cf.data.pre	2016-03-09 21:34:21 +
@@ -402,40 +402,73 @@ LOC: Config.cpuAffinityMap
 DEFAULT: none
 DEFAULT_DOC: Let operating system decide.
 DOC_START
 	Usage: cpu_affinity_map process_numbers=P1,P2,... cores=C1,C2,...
 
 	Sets 1:1 mapping between Squid processes and CPU cores. For example,
 
 	cpu_affinity_map process_numbers=1,2,3,4 cores=1,3,5,7
 
 	affects processes 1 through 4 only and places them on the first
 	four even cores, starting with core #1.
 
 	CPU cores are numbered starting from 1. Requires support for
 	sched_getaffinity(2) and sched_setaffinity(2) system calls.
 
 	Multiple cpu_affinity_map options are merged.
 
 	See also: workers
 DOC_END
 
+NAME: shared_memory_locking
+TYPE: YesNoNone
+COMMENT: on|off
+LOC: Config.shmLocking
+DEFAULT: off
+DOC_START
+	Whether to ensure that all required shared memory is available by
+	"locking" that shared memory into RAM when Squid starts. The
+	alternative is faster startup time followed by slightly slower
+	performance and, if not enough RAM is actually available during
+	runtime, mysterious crashes.
+
+	SMP Squid uses many shared memory segments. These segments are
+	brought into Squid memory space using an mmap(2) system call. During
+	Squid startup, the mmap() 

Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Eliezer Croitoru

Just wondering,

Can this be verified in any way?
I have seen couple times in the past that things are not as expected to 
be but I do not have enough debug output reading experience.


Eliezer

On 11/03/2016 05:28, Amos Jeffries wrote:

To partially address your concern, I set debugs() level to 5 and removed
>the ERROR prefix. This still helps with triage when higher debugging
>levels are configured and makes the new code consistent with most other
>fatal() calls in Segment.cc.

If there is a bug in fatalf() that's something else that needs to be
fixed. But there has been no sign that I'm aware of about any such issue
in for any of the hundreds of other calls to it. Please dont make bad
code here just for that.

Amos


___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Amos Jeffries
On 11/03/2016 4:01 a.m., Alex Rousskov wrote:
> On 03/10/2016 01:33 AM, Amos Jeffries wrote:
>> On 10/03/2016 11:14 a.m., Alex Rousskov wrote:
>>> Hello,
>>>
>>> The attached patch adds a "shared_memory_locking" configuration
>>> directive to control mlock(2).
>>>
>>> Locking shared memory at startup avoids SIGBUS crashes when kernel runs
>>> out of RAM during runtime. This has been discussed during the "[RFC] Fix
>>> shared memory initialization, cleanup. Ensure its usability." thread
>>> archived at
>>> http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html
>>>
>>> Why not enable locking by default? Unfortunately, locking requires
>>> privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus,
>>> requiring locked memory by default is likely to cause too many
>>> complaints, especially since Squid has not required that before. Even
>>> "make check" will not work in most environments (without making our unit
>>> tests even less representative)! Thus, the default is off, at least for now.
>>>
>>> As we gain more experience, we may enable locking by default while
>>> making default locking failures non-fatal and warning about significant
>>> [accumulated] locking delays. The proposed code was written to make
>>> those possible future changes easier, but I am not volunteering to
>>> implement them at this time.
>>>
>>
>> In Ipc::Mem::Segment::lock():
>>
>> * please use #if defined() instead of #ifdef.
> 
> Done.
> 
> 
>> * fatalf() sends a FATAL level error to cache.log. No need to preceed it
>> with a less important debugs ERROR message saying the same thing.
> 
> There is such a need, unfortunately: The FATAL error is printed much
> later than the error is found, making triage a lot more difficult in
> some cases. In the extreme cases, the FATAL error is not printed at all
> (because Squid crashes long before fatal.cc code gets to printing the
> error). Fixing this fatal() problem is outside this patch scope.

I'm talking about these lines:

 ... {
+const int savedError = errno;
+debugs(54, DBG_IMPORTANT, "ERROR: shared_memory_locking enabled
but mlock(" << theName << ',' << theSize << ") failed: " <<
xstrerr(savedError));
+fatalf("shared_memory_locking on but failed to mlock(%s, %"
PRId64 "): %s\n",
+   theName.termedBuf(), theSize, xstrerr(savedError));
+}

> 
> To partially address your concern, I set debugs() level to 5 and removed
> the ERROR prefix. This still helps with triage when higher debugging
> levels are configured and makes the new code consistent with most other
> fatal() calls in Segment.cc.

If there is a bug in fatalf() that's something else that needs to be
fixed. But there has been no sign that I'm aware of about any such issue
in for any of the hundreds of other calls to it. Please dont make bad
code here just for that.

Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Alex Rousskov
On 03/10/2016 01:33 AM, Amos Jeffries wrote:
> On 10/03/2016 11:14 a.m., Alex Rousskov wrote:
>> Hello,
>>
>> The attached patch adds a "shared_memory_locking" configuration
>> directive to control mlock(2).
>>
>> Locking shared memory at startup avoids SIGBUS crashes when kernel runs
>> out of RAM during runtime. This has been discussed during the "[RFC] Fix
>> shared memory initialization, cleanup. Ensure its usability." thread
>> archived at
>> http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html
>>
>> Why not enable locking by default? Unfortunately, locking requires
>> privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus,
>> requiring locked memory by default is likely to cause too many
>> complaints, especially since Squid has not required that before. Even
>> "make check" will not work in most environments (without making our unit
>> tests even less representative)! Thus, the default is off, at least for now.
>>
>> As we gain more experience, we may enable locking by default while
>> making default locking failures non-fatal and warning about significant
>> [accumulated] locking delays. The proposed code was written to make
>> those possible future changes easier, but I am not volunteering to
>> implement them at this time.
>>
> 
> In Ipc::Mem::Segment::lock():
> 
> * please use #if defined() instead of #ifdef.

Done.


> * fatalf() sends a FATAL level error to cache.log. No need to preceed it
> with a less important debugs ERROR message saying the same thing.

There is such a need, unfortunately: The FATAL error is printed much
later than the error is found, making triage a lot more difficult in
some cases. In the extreme cases, the FATAL error is not printed at all
(because Squid crashes long before fatal.cc code gets to printing the
error). Fixing this fatal() problem is outside this patch scope.

To partially address your concern, I set debugs() level to 5 and removed
the ERROR prefix. This still helps with triage when higher debugging
levels are configured and makes the new code consistent with most other
fatal() calls in Segment.cc.


Thank you,

Alex.

Added shared_memory_locking configuration directive to control mlock(2).

Locking shared memory at startup avoids SIGBUS crashes when kernel runs
out of RAM during runtime. Why not enable it by default? Unfortunately,
locking requires privileges and/or much-higher-than-default
RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is
likely to cause too many complaints, especially since Squid has not
required that before. The default is off, at least for now.

As we gain more experience, we may try to enable locking by default
while making default locking failures non-fatal and warning about
significant [accumulated] locking delays.

=== modified file 'src/SquidConfig.h'
--- src/SquidConfig.h	2016-01-15 06:47:59 +
+++ src/SquidConfig.h	2016-03-01 05:22:45 +
@@ -55,40 +55,41 @@ public:
 int n_configured;
 /// number of disk processes required to support all cache_dirs
 int n_strands;
 };
 #define INDEXSD(i) (Config.cacheSwap.swapDirs[i].getRaw())
 }
 
 /// the representation of the configuration. POD.
 class SquidConfig
 {
 public:
 struct {
 /* These should be for the Store::Root instance.
 * this needs pluggable parsing to be done smoothly.
 */
 int highWaterMark;
 int lowWaterMark;
 } Swap;
 
 YesNoNone memShared; ///< whether the memory cache is shared among workers
+YesNoNone shmLocking; ///< shared_memory_locking
 size_t memMaxSize;
 
 struct {
 int64_t min;
 int pct;
 int64_t max;
 } quickAbort;
 int64_t readAheadGap;
 RemovalPolicySettings *replPolicy;
 RemovalPolicySettings *memPolicy;
 #if USE_HTTP_VIOLATIONS
 time_t negativeTtl;
 #endif
 time_t maxStale;
 time_t negativeDnsTtl;
 time_t positiveDnsTtl;
 time_t shutdownLifetime;
 time_t backgroundPingRate;
 
 struct {

=== modified file 'src/cf.data.pre'
--- src/cf.data.pre	2016-02-19 17:19:25 +
+++ src/cf.data.pre	2016-03-09 21:34:21 +
@@ -402,40 +402,73 @@ LOC: Config.cpuAffinityMap
 DEFAULT: none
 DEFAULT_DOC: Let operating system decide.
 DOC_START
 	Usage: cpu_affinity_map process_numbers=P1,P2,... cores=C1,C2,...
 
 	Sets 1:1 mapping between Squid processes and CPU cores. For example,
 
 	cpu_affinity_map process_numbers=1,2,3,4 cores=1,3,5,7
 
 	affects processes 1 through 4 only and places them on the first
 	four even cores, starting with core #1.
 
 	CPU cores are numbered starting from 1. Requires support for
 	sched_getaffinity(2) and sched_setaffinity(2) system calls.
 
 	Multiple cpu_affinity_map options are merged.
 
 	See also: workers
 DOC_END
 
+NAME: shared_memory_locking
+TYPE: YesNoNone
+COMMENT: on|off
+LOC: Config.shmLocking
+DEFAULT: off
+DOC_START
+	Whether to ensure that all required shared memory is available by
+	"locking" that shared memory 

Re: [squid-dev] [PATCH] shared_memory_locking

2016-03-10 Thread Amos Jeffries
On 10/03/2016 11:14 a.m., Alex Rousskov wrote:
> Hello,
> 
> The attached patch adds a "shared_memory_locking" configuration
> directive to control mlock(2).
> 
> Locking shared memory at startup avoids SIGBUS crashes when kernel runs
> out of RAM during runtime. This has been discussed during the "[RFC] Fix
> shared memory initialization, cleanup. Ensure its usability." thread
> archived at
> http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html
> 
> Why not enable locking by default? Unfortunately, locking requires
> privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus,
> requiring locked memory by default is likely to cause too many
> complaints, especially since Squid has not required that before. Even
> "make check" will not work in most environments (without making our unit
> tests even less representative)! Thus, the default is off, at least for now.
> 
> As we gain more experience, we may enable locking by default while
> making default locking failures non-fatal and warning about significant
> [accumulated] locking delays. The proposed code was written to make
> those possible future changes easier, but I am not volunteering to
> implement them at this time.
> 

In Ipc::Mem::Segment::lock():

* please use #if defined() instead of #ifdef.

* fatalf() sends a FATAL level error to cache.log. No need to preceed it
with a less important debugs ERROR message saying the same thing.



Amos

___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] [PATCH] shared_memory_locking

2016-03-09 Thread Alex Rousskov
Hello,

The attached patch adds a "shared_memory_locking" configuration
directive to control mlock(2).

Locking shared memory at startup avoids SIGBUS crashes when kernel runs
out of RAM during runtime. This has been discussed during the "[RFC] Fix
shared memory initialization, cleanup. Ensure its usability." thread
archived at
http://lists.squid-cache.org/pipermail/squid-dev/2015-December/004112.html

Why not enable locking by default? Unfortunately, locking requires
privileges and/or much-higher-than-default RLIMIT_MEMLOCK limits. Thus,
requiring locked memory by default is likely to cause too many
complaints, especially since Squid has not required that before. Even
"make check" will not work in most environments (without making our unit
tests even less representative)! Thus, the default is off, at least for now.

As we gain more experience, we may enable locking by default while
making default locking failures non-fatal and warning about significant
[accumulated] locking delays. The proposed code was written to make
those possible future changes easier, but I am not volunteering to
implement them at this time.


Thank you,

Alex.
Added shared_memory_locking configuration directive to control mlock(2).

Locking shared memory at startup avoids SIGBUS crashes when kernel runs
out of RAM during runtime. Why not enable it by default? Unfortunately,
locking requires privileges and/or much-higher-than-default
RLIMIT_MEMLOCK limits. Thus, requiring locked memory by default is
likely to cause too many complaints, especially since Squid has not
required that before. The default is off, at least for now.

As we gain more experience, we may try to enable locking by default
while making default locking failures non-fatal and warning about
significant [accumulated] locking delays.

=== modified file 'src/SquidConfig.h'
--- src/SquidConfig.h	2016-01-15 06:47:59 +
+++ src/SquidConfig.h	2016-03-01 05:22:45 +
@@ -55,40 +55,41 @@ public:
 int n_configured;
 /// number of disk processes required to support all cache_dirs
 int n_strands;
 };
 #define INDEXSD(i) (Config.cacheSwap.swapDirs[i].getRaw())
 }
 
 /// the representation of the configuration. POD.
 class SquidConfig
 {
 public:
 struct {
 /* These should be for the Store::Root instance.
 * this needs pluggable parsing to be done smoothly.
 */
 int highWaterMark;
 int lowWaterMark;
 } Swap;
 
 YesNoNone memShared; ///< whether the memory cache is shared among workers
+YesNoNone shmLocking; ///< shared_memory_locking
 size_t memMaxSize;
 
 struct {
 int64_t min;
 int pct;
 int64_t max;
 } quickAbort;
 int64_t readAheadGap;
 RemovalPolicySettings *replPolicy;
 RemovalPolicySettings *memPolicy;
 #if USE_HTTP_VIOLATIONS
 time_t negativeTtl;
 #endif
 time_t maxStale;
 time_t negativeDnsTtl;
 time_t positiveDnsTtl;
 time_t shutdownLifetime;
 time_t backgroundPingRate;
 
 struct {

=== modified file 'src/cf.data.pre'
--- src/cf.data.pre	2016-02-19 17:19:25 +
+++ src/cf.data.pre	2016-03-09 21:34:21 +
@@ -402,40 +402,73 @@ LOC: Config.cpuAffinityMap
 DEFAULT: none
 DEFAULT_DOC: Let operating system decide.
 DOC_START
 	Usage: cpu_affinity_map process_numbers=P1,P2,... cores=C1,C2,...
 
 	Sets 1:1 mapping between Squid processes and CPU cores. For example,
 
 	cpu_affinity_map process_numbers=1,2,3,4 cores=1,3,5,7
 
 	affects processes 1 through 4 only and places them on the first
 	four even cores, starting with core #1.
 
 	CPU cores are numbered starting from 1. Requires support for
 	sched_getaffinity(2) and sched_setaffinity(2) system calls.
 
 	Multiple cpu_affinity_map options are merged.
 
 	See also: workers
 DOC_END
 
+NAME: shared_memory_locking
+TYPE: YesNoNone
+COMMENT: on|off
+LOC: Config.shmLocking
+DEFAULT: off
+DOC_START
+	Whether to ensure that all required shared memory is available by
+	"locking" that shared memory into RAM when Squid starts. The
+	alternative is faster startup time followed by slightly slower
+	performance and, if not enough RAM is actually available during
+	runtime, mysterious crashes.
+
+	SMP Squid uses many shared memory segments. These segments are
+	brought into Squid memory space using an mmap(2) system call. During
+	Squid startup, the mmap() call often succeeds regardless of whether
+	the system has enough RAM. In general, Squid cannot tell whether the
+	kernel applies this "optimistic" memory allocation policy (but
+	popular modern kernels usually use it).
+
+	Later, if Squid attempts to actually access the mapped memory
+	regions beyond what the kernel is willing to allocate, the
+	"optimistic" kernel simply kills Squid kid with a SIGBUS signal.
+	Some of the memory limits enforced by the kernel are currently
+	poorly understood: We do not know how to detect and check them. This
+	option ensures that the mapped memory will be available. 
+
+	This option may have