Hi Brian,
Could you post your conf/httpd.conf.d/httpd.webservices please?
Also, post the output to this:
# cat /proc/$(pgrep -u pf -nf webservices)/limits
Google suggests you may be hitting a limit somewhere.
--
Louis Munro
[email protected] :: www.inverse.ca
+1.514.447.4918 x125 :: +1 (866) 353-6153 x125
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence
(www.packetfence.org)
On 2014-10-02, at 23:15 , Brian Lucas <[email protected]> wrote:
> Well, the problem has resurfaced. The relevant log entries are:
>
> [Thu Oct 02 22:04:07 2014] [emerg] (4)Interrupted system call: couldn't grab
> the accept mutex
>
> [Thu Oct 02 22:04:07 2014] [alert] Child 4973 returned a Fatal error...
> Apache is exiting!
>
>
>
> I can't figure this out. Any help would be appreciated. webservices is
> definitely what is crashing. I find it a bit odd that there are entries in
> packetfence.log after this crash attributed to httpd.webservices and it takes
> some time after this crash for everything to fall apart. pfmon doesn't seem
> to successfully restart it either.
>
> Oct 02 22:10:13 pfcmd.pl(5590) INFO: Daemon httpd.webservices took 7.219
> seconds to start. (pf::services::manager::launchService)
>
> makes it appear as if it started back up, but it does not and there is no
> relevant info in httpd.webservices.error
>
>
>
> I am at a loss :(
>
>
>
>
> On Wed, Oct 1, 2014 at 10:00 AM, Brian Lucas <[email protected]> wrote:
> Thanks for following along everyone. The problem appears to be resolved. I'm
> guessing it was more the 2 process per interface causing deadlocks. But
> cleaning the database sure didn't hurt! Cheers!
>
> On Sep 30, 2014 4:30 PM, "Brian Lucas" <[email protected]> wrote:
> Further research shows that I had two pfdhcplisteners running on each
> interface... bad shutdown somewhere along the way? Some reading of the past
> posts shows that can cause some database issues as well.. Here's hoping.
>
> On Tue, Sep 30, 2014 at 2:52 PM, Brian Lucas <[email protected]> wrote:
> I'm going to have to wait for traffic to be back up to normal to see if we're
> okay but it looks like the database maintenance script was not running since
> the update due to the password for mysql needing to be re input. The radacct
> table MAY have been getting out of hand big and causing a slowdown that was
> in turn causing the webservices to crash and bringing everything down. Fixed
> the password and cleaned the database. I will post back with results once my
> users are back.
>
> Brian
>
> On Tue, Sep 30, 2014 at 10:19 AM, Brian Lucas <[email protected]> wrote:
> There hasn't been enough usage on the network to cause my crash to happen
> again yet, but I thought maybe this could help diagnose the problem. Here is
> a snippit of httpd.webservices.error around the timeframe of the crashes.
>
>
> [Sun Sep 28 14:01:16 2014] [notice] Apache/2.2.15 (Unix) mod_ssl/2.2.15
> OpenSSL/1.0.1e-fips mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal
> operations
>
> [Sun Sep 28 14:02:21 2014] [error] server reached MaxClients setting,
> consider raising the MaxClients setting
>
> [Sun Sep 28 14:06:24 2014] [emerg] (4)Interrupted system call: couldn't grab
> the accept mutex
>
> [Sun Sep 28 14:06:24 2014] [alert] Child 11638 returned a Fatal error...
> Apache is exiting!
>
> [Sun Sep 28 14:06:46 2014] [emerg] (4)Interrupted system call: couldn't grab
> the accept mutex
>
> [Sun Sep 28 14:06:47 2014] [emerg] (4)Interrupted system call: couldn't grab
> the accept mutex
>
> [Sun Sep 28 14:17:24 2014] [emerg] (4)Interrupted system call: couldn't grab
> the accept mutex
>
> [Sun Sep 28 14:17:25 2014] [emerg] (4)Interrupted system call: couldn't grab
> the accept mutex
>
>
>
> Seeing the "Server reached MaxClients setting" error as well as the crazy
> amount of connections that are happening per minute in the
> httpd.webservices.access file makes me wonder if this is the problem.
>
>
>
> Here is a snippit of that log around the same time frame:
>
>
>
> 127.0.0.1 - - [28/Sep/2014:14:03:17 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1"
> 200 35 "-" "-"
>
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1"
> 200 35 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:03 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:02:03 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:02:19 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1"
> 200 35 "-" "-"
>
> 127.0.0.1 - radius_accounting [28/Sep/2014:14:03:15 -0500] "POST / HTTP/1.1"
> 200 35 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:01:37 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:06 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1" 204 -
> "-" "-"
>
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1"
> 204 - "-" "-"
>
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1"
> 204 - "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:01:37 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:18 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:19 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:04 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:19 -0500] "POST / HTTP/1.1" 204 -
> "-" "-"
>
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:19 -0500] "POST / HTTP/1.1" 204 -
> "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - radius_authorize [28/Sep/2014:14:03:17 -0500] "POST / HTTP/1.1"
> 200 66 "-" "-"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:20 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:21 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:22 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:22 -0500] "POST / HTTP/1.1"
> 204 - "-" "-"
>
> 127.0.0.1 - trigger_violation [28/Sep/2014:14:03:22 -0500] "POST / HTTP/1.1"
> 204 - "-" "-"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:23 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:24 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - update_iplog [28/Sep/2014:14:03:24 -0500] "POST / HTTP/1.1" 204 -
> "-" "-"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:25 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:26 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:27 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
> 127.0.0.1 - - [28/Sep/2014:14:03:28 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
>
> 127.0.0.1 - - [28/Sep/2014:14:03:29 -0500] "OPTIONS * HTTP/1.0" 200 - "-"
> "Apache (internal dummy connection)"
>
>
>
>
>
>
> On Mon, Sep 29, 2014 at 6:09 PM, Brian Lucas <[email protected]> wrote:
> Will do and post back. Right now I have all services restarting one an hour
> from cron to avoid disruption to my users. Will have to wait until an odd
> hour.
>
> Brian
>
> On Sep 29, 2014 3:22 PM, "Louis Munro" <[email protected]> wrote:
>
>
> On 2014-09-26, at 19:19 , Brian Lucas <[email protected]> wrote:
>
>> All,
>>
>> I'm seeing 1000s of the following error per hour on our setup after the
>> update to 4.4. A restart of all services clears it up for a time, but it
>> comes back. Any suggestions as to the problem?
>>
>> Fri Sep 26 18:16:45 2014 : Error: rlm_perl: An error occurred while
>> processing the authorize RPC request: An error occured while sending a
>> MessagePack request: 7 Couldn't connect to server couldn't connect to host
>> at /usr/local/pf/lib//pf/radius/rpc.pm line 52.
>>
>>
>
> Hi Brian,
>
> Next time this happens, try to see why radius could not connect to the
> webservice.
>
> By default, the webservice runs on port 9090 on localhost.
> If you try to connect to it yourself, does it work?
>
> e.g. run this command:
>
> # curl -kvLI http://localhost:9090
>
> And see what it says.
>
> --
> Louis Munro
> [email protected] :: www.inverse.ca
> +1.514.447.4918 x125 :: +1 (866) 353-6153 x125
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence
> (www.packetfence.org)
>
>
> ------------------------------------------------------------------------------
> Slashdot TV. Videos for Nerds. Stuff that Matters.
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> _______________________________________________
> PacketFence-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/packetfence-users
>
>
>
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk_______________________________________________
> PacketFence-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/packetfence-users
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
PacketFence-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/packetfence-users