[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-02-13 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #36 from m...@blackmans.org ---
(In reply to Yann Ylavic from comment #34)
> Created attachment 35723 [details]
> Reuse SHMs names on restart or stop/start (2.4.x)
> 
> This is the full patch proposed to be backported to 2.4.next.
> 
> It should reuse the SHMs names as much as possible on restart or stop/start,
> which should address the increasing number of IPCs on the system if/when the
> parent process crashes.
> 
> Please note that it won't reuse SHMs if by some means children process from
> an old httpd instance (whose parent process crashed) are still alive, this
> is not something desirable.
> 
> Could you test it with your large configuration?

Thanks, we will aim to test it in our next scheduled update, early March.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-02-13 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

Graham Leggett  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #35 from Graham Leggett  ---
Backported to v2.4.30.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-02-08 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

Yann Ylavic  changed:

   What|Removed |Added

  Attachment #35702|0   |1
is obsolete||

--- Comment #34 from Yann Ylavic  ---
Created attachment 35723
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35723=edit
Reuse SHMs names on restart or stop/start (2.4.x)

This is the full patch proposed to be backported to 2.4.next.

It should reuse the SHMs names as much as possible on restart or stop/start,
which should address the increasing number of IPCs on the system if/when the
parent process crashes.

Please note that it won't reuse SHMs if by some means children process from an
old httpd instance (whose parent process crashed) are still alive, this is not
something desirable.

Could you test it with your large configuration?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #33 from Yann Ylavic  ---
Mark, followed up on dev@ since debugging in not really suitable in bugzilla.
Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #32 from m...@blackmans.org ---
so sig_coredump is being triggered by an unknown signal, multiple times a day. 
It's not a segfault, nothing in /var/log/messages. That results in a bunch of
undeleted shared memory segments and probably some that will no longer be in
the global list, but still present in the kernel.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #31 from m...@blackmans.org ---
(In reply to Yann Ylavic from comment #27)
> (In reply to mark from comment #7)
> > 
> > " AH00060: seg fault or similar nasty error detected in the parent process"
> > but I cannot tell what it's referring to.
> 
> The parent process crashed leaving children orphaned (hence attached to
> SHMs).
> 
> You possibly need this patch too:
> https://svn.apache.org/repos/asf/httpd/httpd/patches/2.4.x/stop_signals-
> PR61558.patch
> It was merged for upcoming 2.4.30 already (r1820794).
> See Bug 61558.

I can't see evidence of a crash beyond that message. Could it be referring to
the exit triggered by the "file exists" problem?

i.e. HUP is received, SHMs are marked as deleted but processes are still
attached so they are still present for the HUP restart and that triggers the
"crash" exit and thus other SHMs fail to get deleted?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #30 from m...@blackmans.org ---
(In reply to Yann Ylavic from comment #26)
> (In reply to mark from comment #25)
> > 
> > [Wed Jan 31 12:26:41.463136 2018] [slotmem_shm:debug] [pid 1322:tid
> > 139715805775616] mod_slotmem_shm.c(463): AH02301: attach looking for
> > /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm
> > [Wed Jan 31 12:26:41.463169 2018] [slotmem_shm:debug] [pid 1322:tid
> > 139715805775616] mod_slotmem_shm.c(476): AH02302: attach found
> ^ This is a child process attaching the SHMs created by the parent process.
> 
> > /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
> > [Wed Jan 31 12:33:51.761487 2018] [mpm_event:notice] [pid 65265:tid
> > 139715805775616] AH00494: SIGHUP received.  Attempting to restart
> ^ This is the parent process asked to restart (non graceful).
> 
> > [Wed Jan 31 12:34:54.471933 2018] [slotmem_shm:debug] [pid 20672:tid
> > 139965041129216] mod_slotmem_shm.c(331): AH02602: create didn't find
> > /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm in global list
> > [Wed Jan 31 12:34:54.471939 2018] [slotmem_shm:debug] [pid 20672:tid
> > 139965041129216] mod_slotmem_shm.c(341): AH02300: create
> > /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
> > [Wed Jan 31 12:34:54.471970 2018] [slotmem_shm:error] [pid 20672:tid
> > 139965041129216] (17)File exists: AH02611: create:
> ^ This is *another* parent process(not the same pid), ditto for the
> following messages (stripped here).
> 
> How so? One minute for a non-graceful restart looks huge too.
> Do you have multiple instances of httpd running (and using the same log
> file)?
> Could you monitor the processes here?

We have multiple configurations running, but each with their own log files. We
have both Apache 2.2 and Apache 2.4 configurations running side by side, but
completely isolated in terms of configuration, log and run directories.  Each
of our configuration files tends to have around 200k lines including comments
and blank lines and we use a lot of 3rd party modules, so it's they're big
configurations.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #29 from Yann Ylavic  ---
> (In reply to Yann Ylavic from comment #24)
> > Created attachment 35710 [details]
> > Unique balancer id per vhost
> 
> Committed to trunk in r1822800.
Reverted, all was there already (sname vs name).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

Yann Ylavic  changed:

   What|Removed |Added

  Attachment #35710|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

Yann Ylavic  changed:

   What|Removed |Added

   Keywords||FixedInTrunk

--- Comment #28 from Yann Ylavic  ---
(In reply to Yann Ylavic from comment #14)
> Created attachment 35702 [details]
> slotmem SHMs reuse (2.4.x)

Committed to trunk in r1822509.

(In reply to Yann Ylavic from comment #24)
> Created attachment 35710 [details]
> Unique balancer id per vhost

Committed to trunk in r1822800.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #27 from Yann Ylavic  ---
(In reply to mark from comment #7)
> 
> " AH00060: seg fault or similar nasty error detected in the parent process"
> but I cannot tell what it's referring to.

The parent process crashed leaving children orphaned (hence attached to SHMs).

You possibly need this patch too:
https://svn.apache.org/repos/asf/httpd/httpd/patches/2.4.x/stop_signals-PR61558.patch
It was merged for upcoming 2.4.30 already (r1820794).
See Bug 61558.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #26 from Yann Ylavic  ---
(In reply to mark from comment #25)
> 
> [Wed Jan 31 12:26:41.463136 2018] [slotmem_shm:debug] [pid 1322:tid
> 139715805775616] mod_slotmem_shm.c(463): AH02301: attach looking for
> /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm
> [Wed Jan 31 12:26:41.463169 2018] [slotmem_shm:debug] [pid 1322:tid
> 139715805775616] mod_slotmem_shm.c(476): AH02302: attach found
^ This is a child process attaching the SHMs created by the parent process.

> /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
> [Wed Jan 31 12:33:51.761487 2018] [mpm_event:notice] [pid 65265:tid
> 139715805775616] AH00494: SIGHUP received.  Attempting to restart
^ This is the parent process asked to restart (non graceful).

> [Wed Jan 31 12:34:54.471933 2018] [slotmem_shm:debug] [pid 20672:tid
> 139965041129216] mod_slotmem_shm.c(331): AH02602: create didn't find
> /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm in global list
> [Wed Jan 31 12:34:54.471939 2018] [slotmem_shm:debug] [pid 20672:tid
> 139965041129216] mod_slotmem_shm.c(341): AH02300: create
> /var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
> [Wed Jan 31 12:34:54.471970 2018] [slotmem_shm:error] [pid 20672:tid
> 139965041129216] (17)File exists: AH02611: create:
^ This is *another* parent process(not the same pid), ditto for the following
messages (stripped here).

How so? One minute for a non-graceful restart looks huge too.
Do you have multiple instances of httpd running (and using the same log file)?
Could you monitor the processes here?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #25 from m...@blackmans.org ---
Perhaps, we're seeing the error even in the balancer SHMs as well as the worker
SHMs and the balancer SHM already uses conf->id as a distinguisher.

https://github.com/apache/httpd/blob/2.4.29/modules/proxy/mod_proxy_balancer.c#L814

For this balancer (not worker), even with Jim's change, we saw the following.
Summarizing first

12:26:41  - attach found and attached to slotmem-shm-p701d8bbe_0
12:33:51  - SIGHUP
12:34:54  - create (not attach) fails to find slotmem-shm-p701d8bbe_0
12:34:54  - create fails to create because the SHM key/segment is still in the
kernel
12:38:54  - create (under a new PID) fails to find slotmem-shm-p701d8bbe_0 but
successfully creates it, presumably because all attached processes detached
finally.

Why didnt the generation change? it was zero before and after the HUP.

[Wed Jan 31 12:26:41.463136 2018] [slotmem_shm:debug] [pid 1322:tid
139715805775616] mod_slotmem_shm.c(463): AH02301: attach looking for
/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm
[Wed Jan 31 12:26:41.463169 2018] [slotmem_shm:debug] [pid 1322:tid
139715805775616] mod_slotmem_shm.c(476): AH02302: attach found
/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
[Wed Jan 31 12:33:51.761487 2018] [mpm_event:notice] [pid 65265:tid
139715805775616] AH00494: SIGHUP received.  Attempting to restart
[Wed Jan 31 12:34:54.471933 2018] [slotmem_shm:debug] [pid 20672:tid
139965041129216] mod_slotmem_shm.c(331): AH02602: create didn't find
/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm in global list
[Wed Jan 31 12:34:54.471939 2018] [slotmem_shm:debug] [pid 20672:tid
139965041129216] mod_slotmem_shm.c(341): AH02300: create
/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
[Wed Jan 31 12:34:54.471970 2018] [slotmem_shm:error] [pid 20672:tid
139965041129216] (17)File exists: AH02611: create:
apr_shm_create(/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm) failed
[Wed Jan 31 12:38:46.746713 2018] [slotmem_shm:debug] [pid 31117:tid
140506605512448] mod_slotmem_shm.c(331): AH02602: create didn't find
/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm in global list
[Wed Jan 31 12:38:46.746719 2018] [slotmem_shm:debug] [pid 31117:tid
140506605512448] mod_slotmem_shm.c(341): AH02300: create
/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm: 992/6
[Wed Jan 31 12:38:46.746893 2018] [slotmem_shm:debug] [pid 31117:tid
140506605512448] mod_slotmem_shm.c(384): AH02611: create:
apr_shm_create(/var/run/http/apache24/tmp/slotmem-shm-p701d8bbe_0.shm)
succeeded
[Wed Jan 31 12:38:49.922030 2018] [mpm_event:notice] [pid 31117:tid
140506605512448] AH00489: Apache/2.4.29 (Unix) OpenSSL/1.0.2n mod_fcgid/2.3.9
mod_auth_kerb/5.4 mod_qos/11.43 mod_jk/1.2.42 configured -- resuming

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #24 from Yann Ylavic  ---
Created attachment 35710
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35710=edit
Unique balancer id per vhost

It seems indeed that if balancer:// are not unique the slotmem is reused
accross vhosts.

Does this patch help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-31 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #23 from m...@blackmans.org ---
That more conservative patch doesn't seem to have helped either.

[Wed Jan 31 08:44:12.677361 2018] [proxy:debug] [pid 58615:tid 140446564935424]
proxy_util.c(1225): AH02337: copying shm[2] (0x7fbc398b07d8) for
balancer://balancer3
[Wed Jan 31 08:44:12.677429 2018] [slotmem_shm:debug] [pid 58615:tid
140446564935424] mod_slotmem_shm.c(331): AH02602: create didn't find
/var/run/http/apache24/tmp/slotmem-shm-p5dfa5b80_balancer3_0.shm in global list
[Wed Jan 31 08:44:12.677469 2018] [slotmem_shm:debug] [pid 58615:tid
140446564935424] mod_slotmem_shm.c(341): AH02300: create
/var/run/http/apache24/tmp/slotmem-shm-p5dfa5b80_balancer3_0.shm: 1176/2
[Wed Jan 31 08:44:12.677585 2018] [slotmem_shm:error] [pid 58615:tid
140446564935424] (17)File exists: AH02611: create:
apr_shm_create(/var/run/http/apache24/tmp/slotmem-shm-p5dfa5b80_balancer3_0.shm)
failed
[Wed Jan 31 08:44:12.677677 2018] [:emerg] [pid 58615:tid 140446564935424]
AH00020: Configuration Failed, exiting

We keep bumping into previously created keys. I wonder if our balancer naming
isn't distinctive enough, literally each vhost gets balancer1, balancer2,
balancer3. So those names appear hundreds or thousands of times per
configuration, but always inside a virtualhost container.

Any ideas?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-30 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #22 from m...@blackmans.org ---
Anyway, in the absence of other ideas, we're going revert to the more
conservative patch, even at the cost of cross-generation persistence, at 

http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/slotmem/mod_slotmem_shm.c?r1=1822341=1822340=1822341=patch

for now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-30 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #21 from m...@blackmans.org ---
In the patched file, line 396 was updated with "gpool" I believe, should 395
have been updated as well?

393 {
394 if (fbased) {
395 apr_shm_remove(fname, pool);
396 rv = apr_shm_create(, size, fname, gpool);
397 }

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-30 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #20 from m...@blackmans.org ---
Sorry, I am wrong, we are still seeing the "file exists" error in our logs.

[Tue Jan 30 09:07:05.575349 2018] [slotmem_shm:debug] [pid 3716:tid
139969799624448] mod_slotmem_shm.c(380): AH02602: create didn't find
/var/run/http/apache24/tmp/slotmem-shm-p7a67b429_balancer1.shm in gl
obal list
[Tue Jan 30 09:07:05.575357 2018] [slotmem_shm:debug] [pid 3716:tid
139969799624448] mod_slotmem_shm.c(390): AH02300: create
/var/run/http/apache24/tmp/slotmem-shm-p7a67b429_balancer1.shm: 1176/2
[Tue Jan 30 09:07:05.575398 2018] [slotmem_shm:error] [pid 3716:tid
139969799624448] (17)File exists: AH02611: create:
apr_shm_create(/var/run/http/apache24/tmp/slotmem-shm-p7a67b429_balancer1.shm)
failed
[Tue Jan 30 09:07:05.575442 2018] [:emerg] [pid 3716:tid 139969799624448]
AH00020: Configuration Failed, exiting

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-30 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #19 from m...@blackmans.org ---
We were able to rebuild and deploy Yann's patch for the pre-production
environments and we're not yet seeing slotmem_shm "File Exists" errors.
However, we are seeing a lot of orphaned shared segments (i.e. zero attached
processes) as though cleanup is not happening appropriately or is getting
bypassed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-29 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #18 from m...@blackmans.org ---
Thanks for the perspective. We were seeing Apache instances fail and not
restart due to the orphaned segments, requiring manual intervention to resolve,
hence our urgency.

However, I see your point now and this "Windows" fix loses too much state to be
the right long term fix and we make extensive use of the proxy  balancer
feature, so I will see about an exceptional change to test this more
comprehensive change in our pre-production  environments.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-29 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #17 from Yann Ylavic  ---
In any case, if you go with the "Windows" approach for your production, we are
still interested in your testing of attachment 35702 for the future ;)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-29 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #16 from Yann Ylavic  ---
(In reply to mark from comment #15)
> Is this later patch either more robust or more
> comprehensive than Jim's?  If you're making a strong recommendation, we will
> see about pushing that version out to the pre-production environments as an
> exceptional change, in advance of the next scheduled roll-out.
I can't do a recommendation given your time constraints, what I can tell is
that if the Windows approach indeed avoids the (re)start failures, it however
does not preserve the state of the balancers accross restarts (including
graceful).
So things like load distribution, error states, ...,  are reset/lost, as if it
were the first startup.

This is not the right fix for httpd, but it may be enough for your use case...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-29 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #15 from m...@blackmans.org ---
We were a bit keen for a fix for this morning (29 Jan), so we went with Jim's
patch in trunk as it looked very conservative (extending tested behaviours to
Unix from Windows). I didn't see your patch at that point.

http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/slotmem/mod_slotmem_shm.c?r1=1822341=1822340=1822341=patch

and we're now rolling that out across the pre-production environments today, 29
Jan.

I can't really comment on the relative merits of either approach, so can you
give me a recommendation. Is this later patch either more robust or more
comprehensive than Jim's?  If you're making a strong recommendation, we will
see about pushing that version out to the pre-production environments as an
exceptional change, in advance of the next scheduled roll-out.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-28 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

Yann Ylavic  changed:

   What|Removed |Added

  Attachment #35698|0   |1
is obsolete||

--- Comment #14 from Yann Ylavic  ---
Created attachment 35702
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35702=edit
slotmem SHMs reuse (2.4.x)

This patch does:
1/ use a constant file name for all systems (no generation suffix),
2/ maintain the list of the created SHMs *accross restarts*
3/ not unlink the files on (graceful) restart anymore (not needed),
4/ not attach in slotmem_create() anymore (not needed),
5/ add type/sizes consistency check for persisted slots on restoration,
6/ unlink the files only on stop/exit or before creating them (crash
remainder).

Mark, could you please try it?

I think we could avoid 6/ if we remove the file just after the SHM is created.
This would work for systems with "unlink semantics" (i.e. unlink allowed while
some descriptors are opened even if it really happens when the last one is
closed, since we don't need to re-open them now), but not for others so I kept
the code generic to start with...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #13 from m...@blackmans.org ---
Yes, I can test fixes.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #12 from Jim Jagielski  ---
Upon review, it appears that in slotmem_filenames() there is code that will
automagically add generational data to the SHM filename... this is done by
default for Win and OS/2.

Are you able to test any fixes?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



***UNCHECKED*** [Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #11 from Jim Jagielski  ---
Yeah, it looks like adding in the generation to conf->id will create a unique
name. But I need to see how it effects persistence

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #10 from m...@blackmans.org ---
baked here:

https://github.com/apache/httpd/blob/2.4.29/modules/proxy/mod_proxy_balancer.c#L787

id = apr_psprintf(pconf, "%s.%s.%d.%s.%s.%u.%s",
  (s->server_scheme ? s->server_scheme : ""),
  (s->server_hostname ? s->server_hostname : "???"),
  (int)s->port,
  (s->server_admin ? s->server_admin : "??"),
  (s->defn_name ? s->defn_name : "?"),
  s->defn_line_number,
  (s->error_fname ? s->error_fname :
DEFAULT_ERRORLOG));

conf->id = apr_psprintf(pconf, "p%x",
ap_proxy_hashfunc(id, PROXY_HASHFUNC_DEFAULT));

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

cbarb...@okta.com changed:

   What|Removed |Added

 CC||cbarb...@okta.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #9 from Jim Jagielski  ---
... or possibly re-used?? I'll need to look. It's been awhile since I've
reviewed that chunk of code.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #8 from Eric Covener  ---
(In reply to mark from comment #7)
> give that man a teddy bear. pid 13310 was born at 03:40:12 with a SIGHUP at
> 03:47:58 and then permanently exiting at 03:48:11.
> 
> [Thu Jan 25 03:40:22.300797 2018] [mpm_event:notice] [pid 13310:tid
> 140455729428224] AH00489: Apache/2.4.29 (Unix) OpenSSL/1.0.2n
> mod_fcgid/2.3.9 mod_auth_kerb/5.4 mod_qos/11.43 mod_jk/1.2.42 configured --
> resuming normal operations
> [Thu Jan 25 03:40:22.300851 2018] [core:notice] [pid 13310:tid
> 140455729428224] AH00094: Command line: '/apache24/bin/httpd -f
> /apache24/conf/dynamic/apache24/httpd.conf -D X'
> [Thu Jan 25 03:47:58.097848 2018] [mpm_event:notice] [pid 13310:tid
> 140455729428224] AH00494: SIGHUP received.  Attempting to restart
> [Thu Jan 25 03:48:11.467544 2018] [core:notice] [pid 13310:tid
> 140455729428224] AH00060: seg fault or similar nasty error detected in the
> parent process
> 
> so, the diagnosis probably remains roughly the same, some SHM keys are not
> getting removed or not removed quickly enough and are still in place the
> next time the same configuration starts up.

If this is the case maybe we could bake the generation name into the filename.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #7 from m...@blackmans.org ---
give that man a teddy bear. pid 13310 was born at 03:40:12 with a SIGHUP at
03:47:58 and then permanently exiting at 03:48:11.

[Thu Jan 25 03:40:22.300797 2018] [mpm_event:notice] [pid 13310:tid
140455729428224] AH00489: Apache/2.4.29 (Unix) OpenSSL/1.0.2n mod_fcgid/2.3.9
mod_auth_kerb/5.4 mod_qos/11.43 mod_jk/1.2.42 configured -- resuming normal
operations
[Thu Jan 25 03:40:22.300851 2018] [core:notice] [pid 13310:tid 140455729428224]
AH00094: Command line: '/apache24/bin/httpd -f
/apache24/conf/dynamic/apache24/httpd.conf -D X'
[Thu Jan 25 03:47:58.097848 2018] [mpm_event:notice] [pid 13310:tid
140455729428224] AH00494: SIGHUP received.  Attempting to restart
[Thu Jan 25 03:48:11.467544 2018] [core:notice] [pid 13310:tid 140455729428224]
AH00060: seg fault or similar nasty error detected in the parent process

so, the diagnosis probably remains roughly the same, some SHM keys are not
getting removed or not removed quickly enough and are still in place the next
time the same configuration starts up.

I can't yet find any trace of the seg fault suggested though. We do see that
line a lot

" AH00060: seg fault or similar nasty error detected in the parent process" but
I cannot tell what it's referring to.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #6 from m...@blackmans.org ---
Probably, we do a lot of active restarts both to bring in managed changes to
the configuration (but only hourly) and reactive restarts when apache stops
responding.  I will examine and get back to you. 

My feeling after reading the code is that an old process still hasn't detached
from the SHM segment, so the SHM key hangs around, but the placeholder file
does get deleted, so when the next Apache process comes along, presumably
without a filled-in global list, it attempts to re-instate a SHM key that still
hasn't been quite released by the last process.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #5 from Ruediger Pluem  ---
(In reply to mark from comment #0)
> With a large number of vhosts ( > 1000 ) and proxy balancer configurations (
> > 1000), we are seeing Apache exit at start up time with a configuration
> error (very frequently) with an error like. 
> 
> [Wed Jan 10 16:28:45.853599 2018] [slotmem_shm:error] [pid 29764:tid
> 140038537377536] (17)File exists: AH02611: create:
> apr_shm_create(/apache24/logs/slotmem-shm-p71143bd8_balancer1.shm) failed
> 
> [Wed Jan 10 16:28:45.853641 2018] [:emerg] [pid 29764:tid 140038537377536]
> AH00020: Configuration Failed, exiting 
> 
> turning on trace5 level logs we see things like the following for a single
> balancer worker (I filtered on the balance SHM name)
> 
> [Thu Jan 25 03:48:08.397926 2018] [slotmem_shm:debug] [pid 13310:tid
> 140455729428224] mod_slotmem_shm.c(364): AH02602: create didn't find
> /apache24/logs/slotmem-shm-pe1b232bb_balancer1.shm in global list


> [Thu Jan 25 03:48:58.529349 2018] [slotmem_shm:debug] [pid 45813:tid
> 139795075143424] mod_slotmem_shm.c(364): AH02602: create didn't find
> /apache24/logs/slotmem-shm-pe1b232bb_balancer1.shm in global list

Hm. The above two lines are weird. mod_proxy_balancer only creates the shm
segments in the post_config phase where there is still only one httpd process.
But I see two different pid's in the above log messages. Did you do a graceful
restart between 03:48:08 and 03:48:58?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-26 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #4 from m...@blackmans.org ---
Thanks for looking, the apr_shm_remove does an apr_file_remove as the final
step, so I would be surprised if another one helps

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-25 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #3 from Yann Ylavic  ---
Created attachment 35698
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=35698=edit
Also remove SHM file if any

Does this help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-25 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #2 from m...@blackmans.org ---
looking at the code for apr_shm_remove at 

https://github.com/apache/apr/blob/1.6.1/shmem/unix/shm.c#L436

I am reminded that

/* Indicate that the segment is to be destroyed as soon
 * as all processes have detached. This also disallows any
 * new attachments to the segment. */
if (shmctl(shmid, IPC_RMID, NULL) == -1) {
goto shm_remove_failed;
}

So, while the remove can succeed, although I note the return status isn't
tested here, the key will hang around until the last process detaches, so the
defensive measure isn't effective.

So back to the original question, why does Apache think this slot isn't already
in the global list.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org



[Bug 62044] shared memory segments are not found in global list, but appear to exist in kernel.

2018-01-25 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=62044

--- Comment #1 from m...@blackmans.org ---
I believe the error arises here

https://github.com/apache/httpd/blob/2.4.29/modules/slotmem/mod_slotmem_shm.c#L408

I assume the 'file exists' error refers to the SHM key rather than the
placeholder file in the filesystem.

However, there is a defensive removal of the key *before* the create, which
makes this error very mysterious, it should be nearly impossible to fail here I
think.

apr_shm_remove(fname, gpool);
rv = apr_shm_create(, size, fname, gpool);

Is there any possibility there is some latency between the removal being
effective and the create starting? Or could the remove fail silently?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-
To unsubscribe, e-mail: bugs-unsubscr...@httpd.apache.org
For additional commands, e-mail: bugs-h...@httpd.apache.org