On Mar 6, 2015, at 15:06 , Arthur Emerson <arthur.emer...@msmc.edu> wrote:

> It ran for 18+ hours without httpd.webservices failing again, so it
> appears that AcceptMutex fixed it.  Can I suggest that this be made
> a permanent change for distribution in future CentOS builds?

Hi Arthur,
The problem is that we already changed it to the current value to fix a more 
widespread problem.

You are the second person reporting this problem.
For everyone else it seems to make things better.
 
I have never been able to reproduce this in my own environment.
Rest assured that the day that I manage to do that reliably will be the one 
where I find a permanent fix that gets added to the package.

> 
> One thing that I did notice in the other person's thread is that
> I did *not* receive an error for MaxClients.  (Apache seems to be
> shutting down extra httpd.webservices tasks a few minutes after boot.)
> This is why I'm not 100% certain that the same underlying problems
> caused both of our problems.

I no longer believe that that is the cause of the issue.
That was a working hypothesis at the time, but it did not pass the test.
Plenty of our clients have hit the MaxClients limit at one point or another 
without apache crashing.

> 
> While looking in the portal_error_log file, I noticed these emergency
> notices that correspond with every PF restart or server reboot:
> 
> portal_error_log:[Thu Mar 05 17:13:28 2015] [emerg] mod_qos(007): could 
> not determine MaxClients/MaxRequestWorkers! You MUST set this directive 
> within the Apache configuration file.
> portal_error_log:[Thu Mar 05 17:13:30 2015] [emerg] mod_qos(007): could 
> not determine MaxClients/MaxRequestWorkers! You MUST set this directive 
> within the Apache configuration file.
> 

I have also seen these messages. 
This is mod_qos complaining a bit loudly for something that will not harm you 
if you are not using it.
I would not worry about it.


> 
> Checking the portal's config file reveals these for where Apache
> is trying to fill in those values:
> 
> $MaxClients = 
> pf::services::manager::httpd::calculate_max_clients(pf::services::manager::
> httpd::get_total_system_memory());
> $StartServers =  
> pf::services::manager::httpd::calculate_start_servers($MaxClients);
> $MinSpareServers = 
> pf::services::manager::httpd::calculate_min_spare_servers($MaxClients);
> 
> Anyway, I was wondering if there is a race condition on startup
> 
> that prevents whatever provides the above values from being loaded
> and ready to provide the answers?


Those values cannot “literally” be undefined. 
Apache would not start. 
At worst it would perhaps use it’s default values.
I really think the mod_qos message is misleading.


> 
> Something else that I did notice is that multiple pfdns and/or
> pfdhcplistener jobs seem to be starting on server boot.  I don't
> know if this is a symptom of the underlying problem that I was
> having, or potentially the cause?  I can say that an overly-aggressive
> service watch cron schedule (like 2 minutes) combined with having PF
> restart failed services creates a race condition where during boot
> the cron job starts some services before the PF startup script, which
> may explain some of these.  (I cloned my running PF VM and booted the
> clone over a dozen times, and had to kill the cron entry to get a clean
> boot with an aggressive cron schedule.)  Service watch may need some
> locking mechanism so that it knows not to restart services during boot,
> because even with a longer cron schedule it could still potentially
> happen to anyone.  Maybe also a conditional on the pfdhcplistener so
> that it doesn't start one if another one is running (no matter what
> the PID file mechanism says)?
> 
> I'm also wondering if multiple running listeners isn't the cause of my
> other observation from months ago that PF seems to be creating duplicate
> node entries for some new machines???
> 
Possibly.
Although what I have seen happen most when multiple pfdhcplisteners are running 
are database deadlocks preventing correct insertion of the records.

It’s normal for pfdhcplistener to be running once per interface.
More than that should not happen. 
pfdhcplistener is already not supposed to start if another one is running on 
the same interface...

Finding a perfect solution that will handle all possible cases of processes 
dying, being slow to stop when asked to exit and just restarting in general has 
been a challenge.
Your suggestions are welcome.
We will investigate these.

If service watch is causing trouble, I suggest looking into a dedicated process 
monitoring tool like monit.
It usually does a better job.

Best regards,
--
Louis Munro
lmu...@inverse.ca  ::  www.inverse.ca 
+1.514.447.4918 x125  :: +1 (866) 353-6153 x125
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
PacketFence-users mailing list
PacketFence-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/packetfence-users

Reply via email to