Re: Version 1.1.1 stops responding
Hi, I have 2 servers with identical hardware/software configs. Both servers hang at the same time. stopping/starting the daemon doesn't resolve the issue, rebooting the box does. Well, in my case the server doesn't respond to a normal kill, so the start/stop scripts indeed wouldn't do the job. Have you tried killing it it with kill -SIGKILL instead of the init scripts? I was assuming it had something to do with the sql module because that is where it paused (see: sql hangs, was (conflicts/duplicates need)) For me it's also somewhere within sql. I will run our server in valgrind this morning, as Alan suggested elsewhere in the thread. I hope it will be responsive enough so I can keep it running for a few hours until the error eventually occurs... Greetings, Stefan -- Stefan WINTER RESTENA Foundation - Réseau Téléinformatique de l'Education Nationale et de la Recherche RD Engineer 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg email: [EMAIL PROTECTED] Tel.: +352 424409-1 http://www.restena.lu Fax: +352 422473 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
This definitely sounds like a problem with either your database server or your firewall. Most likely culprit is the firewall.. -Peter On Mon 10 Apr 2006 13:43, Duane Cox wrote: My problem may indeed be with my init scripts, I'll look into that and verify the status of the daemon. And for the record, what I mean by hang is that the server doesn't segfault or coredump, it just pauses processing data... Give the server 30 seconds, or maybe even 2-3 minutes, then it will start processing packets again. If you can catch the server during this act and look at the debug, you can see that the sql module has hung. Now I don't know if this is the cause or the effect, maybe it means something to someone, I don't know. I am going to attempt to load some previous versoins on the secondary server to see if I can alleviate or pinpoint the problem. Thanks for the feedback. - Original Message - From: Stefan Winter [EMAIL PROTECTED] To: FreeRadius users mailing list freeradius-users@lists.freeradius.org Sent: Monday, April 10, 2006 1:13 AM Subject: Re: Version 1.1.1 stops responding Hi, I have 2 servers with identical hardware/software configs. Both servers hang at the same time. stopping/starting the daemon doesn't resolve the issue, rebooting the box does. Well, in my case the server doesn't respond to a normal kill, so the start/stop scripts indeed wouldn't do the job. Have you tried killing it it with kill -SIGKILL instead of the init scripts? I was assuming it had something to do with the sql module because that is where it paused (see: sql hangs, was (conflicts/duplicates need)) For me it's also somewhere within sql. I will run our server in valgrind this morning, as Alan suggested elsewhere in the thread. I hope it will be responsive enough so I can keep it running for a few hours until the error eventually occurs... Greetings, Stefan -- Stefan WINTER RESTENA Foundation - Réseau Téléinformatique de l'Education Nationale et de la Recherche RD Engineer 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg email: [EMAIL PROTECTED] Tel.: +352 424409-1 http://www.restena.lu Fax: +352 422473 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html -- Peter Nixon http://www.peternixon.net/ PGP Key: http://www.peternixon.net/public.asc pgpHiTSpmBbgE.pgp Description: PGP signature - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Duane Cox [EMAIL PROTECTED] wrote: I have 2 servers with identical hardware/software configs. Both servers hang at the same time. stopping/starting the daemon doesn't resolve the issue, rebooting the box does. That's fairly bad. I'm not sure how something in the application layer could cause that. Maybe an OS issue? But then why would *both* boxes hand at the *same* time? I was assuming it had something to do with the sql module because that is where it paused (see: sql hangs, was (conflicts/duplicates need)) Maybe a wider network issue? I'm just guessing here... Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
On Apr 9, 2006, at 17:46, Alan DeKok wrote: Duane Cox [EMAIL PROTECTED] wrote: I have 2 servers with identical hardware/software configs. Both servers hang at the same time. stopping/starting the daemon doesn't resolve the issue, rebooting the box does. That's fairly bad. I'm not sure how something in the application layer could cause that. Maybe an OS issue? But then why would *both* boxes hand at the *same* time? I was assuming it had something to do with the sql module because that is where it paused (see: sql hangs, was (conflicts/duplicates need)) Maybe a wider network issue? I'm just guessing here... Check the times very closely. They may be 10 seconds apart. I had a problem with a module that was crashing. The first request took out the primary server and then when it didn't respond, 10 seconds later it tried the backup and crashed it also. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
I'm seeing the same thing here with 1.1.1. I have 2 servers with identical hardware/software configs. Both servers hang at the same time. stopping/starting the daemon doesn't resolve the issue, rebooting the box does. I was assuming it had something to do with the sql module because that is where it paused (see: sql hangs, was (conflicts/duplicates need)) - Original Message - From: King, Michael [EMAIL PROTECTED] To: FreeRadius users mailing list freeradius-users@lists.freeradius.org Sent: Thursday, March 23, 2006 9:24 AM Subject: Version 1.1.1 stops responding So I built 1.1.1 on Debian. After a period of so many hours (variable) it stops responding. (Sometimes 2hours, sometimes 16hours) Now here's where it get's weird, (and makes me suspect it might not be freeRADIUS at the root cause) If I stop and restart the freeRADIUS service, it continues to ignore RADIUS packets. But if I restart the server (hard reboot) it works fine. Till it stops responding again. Obviously this is not enough information to help you diagnose the problem. How do I gather that information? The box is a 233 Pentium with 64 megs of ram. Has about 15 AP's, with around 100 users (not simultaneous, maybe 30 simultaneous) So what's the suggested way of gathering more info? Running debug mode piping to a text file? - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Stefan Winter [EMAIL PROTECTED] wrote: When I did it in -X mode, it segfaulted. The end of the -X output is: ... Could you do the same, but with core dumps enabled (ulimit -c unlimited) and symbols? That would help a lot in tracking down the problem. Also, what OS you're running on, etc. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
On Mon, 2006-03-27 at 17:37 -0500, Alan DeKok wrote: (gdb) info threads That *may* be enough. What will also help (if you have symbols) is: (gdb) thread 1 (gdb) bt (gdb) thread 2 (gdb) bt (gdb) thread 3 (gdb) bt ... The easiest way of doing this is (gdb) thread apply all bt on other projects I work on, we normally use (gdb) thread apply all bt full to include all the local variable values as well to save yourself a lot of cut and paste, before issuing the commands to get all the information you should do (gdb) set logging file somefilename (gdb) set logging on then all the command output will be sent to the logfile somefilename as well as being displayed. Stuart For each of the threads it talks about in info threads. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ === Homechoice is a trading name of Video Networks Limited of 205 Holland Park Avenue, London W11 4XB and registered in England and Wales (No. 2740910). This email may contain confidential and privileged information and is intended for the named or authorised recipients only. If you are not the named or authorised recipient of this email, please note that any copying, distribution, disclosure or use of its contents is strictly prohibited. If you have received this email in error please notify the sender immediately and then destroy it. The views expressed in this email are not necessarily those held by Video Networks Limited and we do not accept any liability for any action taken in reliance on the contents of this message. We do not guarantee that the integrity of this email has been maintained, nor that it is free of viruses, interceptions or interference. ___ This email has been scanned for all known viruses by the MessageLabs Email Security System. ___ - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Stuart Auchterlonie [EMAIL PROTECTED] wrote: The easiest way of doing this is (gdb) thread apply all bt ... Thanks. I've updated doc/bugs with that information. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Version 1.1.1 stops responding
Just for some reference (Trying to find commonalities): What OS/Distro are you? I'm Debian testing release How did you Install? (Prebuilt binary / created local package and install / install from source) I created a local Debian package, and installed it. What modules did you enable? PEAP, TTLS, and TLS What is your authentication source? Using ntlm_auth against Active Directory 2003 What is your supplicant? 98% Windows XP built in supplicant. The rest are Linux / Mac clients. I wonder if this has something to do with this bug that got squashed 2006.03.20 v1.0.5, and v1.1.0 - A validation issue exists with the EAP-MSCHAPv2 module in all versions from 1.0.0 (where the module first appeared) to 1.1.0. Insufficient input validation was being done in the EAP-MSCHAPv2 state machine. A malicious attacker could manipulate their EAP-MSCHAPv2 client state machine to potentially convince the server to bypass authentication checks. This bypassing could also result in the server crashing. We recommend that administrators upgrade immediately. -Original Message- From: [EMAIL PROTECTED] g [mailto:[EMAIL PROTECTED] adius.org] On Behalf Of Stefan Winter Sent: Monday, March 27, 2006 1:49 AM To: FreeRadius users mailing list Subject: Re: Version 1.1.1 stops responding Mine seg faulted as well.. Here's the last few lines of the freeradius -X -A modcall: entering group authenticate for request 1002 rlm_eap: Request found, released from the list rlm_eap: EAP/peap rlm_eap: processing type peap rlm_eap_peap: Authenticate rlm_eap_tls: processing TLS rlm_eap_tls: Length Included eaptls_verify returned 11 Interesting. This morning I encountered again that radiusd was claiming to be still listening on its ports, but didn't process anything any more. As other logs showed, someone logged into an Access Point via TTLS at 8:22 and at 8:25 the Nagios Monitoring system marked the RADIUS Server as critical. Scan interval for Nagios is every three minutes. So it could very well be that FreeRADIUS stopped processing packets when it tried to do TTLS. Sounds similar to your case, just that it didn't segfault. Note that we usually use TTLS it several times a day, and FreeRADIUS shows this behaviour only sporadically. I now reverted to 1.1.0 in the hope that it's better there. The way it is now is... disturbing. Greetings, Stefan Winter -- Stefan WINTER Stiftung RESTENA - Réseau Téléinformatique de l'Education Nationale et de la Recherche Ingenieur Forschung Entwicklung 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg E-Mail: [EMAIL PROTECTED] Tel.: +352 424409-1 http://www.restena.lu Fax: +352 422473 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Stefan Winter [EMAIL PROTECTED] wrote: Interesting. This morning I encountered again that radiusd was claiming to be still listening on its ports, but didn't process anything any more. As other logs showed, someone logged into an Access Point via TTLS at 8:22 and at 8:25 the Nagios Monitoring system marked the RADIUS Server as critical. Scan interval for Nagios is every three minutes. So it could very well be that FreeRADIUS stopped processing packets when it tried to do TTLS. Sounds similar to your case, just that it didn't segfault. Note that we usually use TTLS it several times a day, and FreeRADIUS shows this behaviour only sporadically. That's not nice. I now reverted to 1.1.0 in the hope that it's better there. The way it is now is... disturbing. I agree. I don't see why it's the case, either. Maybe the re-arrangement of SSL code from rlm_eap_tls to libeap broke it, but I don't see why. Until we can get more information about what's happening (strace/ktrace, or gdb backtrace), there isn't much anyone can do to fix it. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Version 1.1.1 stops responding
-Original Message- From: adius.org] On Behalf Of Alan DeKok Until we can get more information about what's happening (strace/ktrace, or gdb backtrace), there isn't much anyone can do to fix it. How would I create those traces? (I'm looking for a suggested command line, since I don't normally use those programs) - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
King, Michael [EMAIL PROTECTED] wrote: How would I create those traces? (I'm looking for a suggested command line, since I don't normally use those programs) I'd suggest gdb, and do it in a testing environment if at all possible, to avoid hitting your main server. Also, you *must* have symbols in the binary, meaning that the make install process can't strip the binaries. You may need to edit the Makefiles for this... $ gdb radiusd (gdb) set args -d ... - normal radiusd args (gdb) run And when it stops responding, hit CTRL-C, and see what's up: (gdb) info threads That *may* be enough. What will also help (if you have symbols) is: (gdb) thread 1 (gdb) bt (gdb) thread 2 (gdb) bt (gdb) thread 3 (gdb) bt ... For each of the threads it talks about in info threads. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
King, Michael [EMAIL PROTECTED] wrote: I wonder if this has something to do with this bug that got squashed 2006.03.20 v1.0.5, and v1.1.0 - A validation issue exists with the EAP-MSCHAPv2 module No. EAP-MSCHAP doesn't do TLS, and that code change cannot affect anything but people trying to attack the server. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Mine seg faulted as well.. Here's the last few lines of the freeradius -X -A modcall: entering group authenticate for request 1002 rlm_eap: Request found, released from the list rlm_eap: EAP/peap rlm_eap: processing type peap rlm_eap_peap: Authenticate rlm_eap_tls: processing TLS rlm_eap_tls: Length Included eaptls_verify returned 11 Interesting. This morning I encountered again that radiusd was claiming to be still listening on its ports, but didn't process anything any more. As other logs showed, someone logged into an Access Point via TTLS at 8:22 and at 8:25 the Nagios Monitoring system marked the RADIUS Server as critical. Scan interval for Nagios is every three minutes. So it could very well be that FreeRADIUS stopped processing packets when it tried to do TTLS. Sounds similar to your case, just that it didn't segfault. Note that we usually use TTLS it several times a day, and FreeRADIUS shows this behaviour only sporadically. I now reverted to 1.1.0 in the hope that it's better there. The way it is now is... disturbing. Greetings, Stefan Winter -- Stefan WINTER Stiftung RESTENA - Réseau Téléinformatique de l'Education Nationale et de la Recherche Ingenieur Forschung Entwicklung 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg E-Mail: [EMAIL PROTECTED] Tel.: +352 424409-1 http://www.restena.lu Fax: +352 422473 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Version 1.1.1 stops responding
I'm running it in debug mode (and piping it to a file) Freeradius -X -A crash.log After a few hours this is what I got. On the command line. rad2:/home/mking# /usr/sbin/freeradius -X -A crash.log Killed rad2:/home/mking# The last few lines from the log file are rlm_eap: Request found, released from the list rlm_eap: EAP/peap rlm_eap: processing type peap rlm_eap_peap: Authenticate rlm_eap_tls: processing TLS rlm_eap_tls: Length Included eaptls_verify returned 11 (other): before/accept initialization TLS_accept: before/accept initialization rlm_eap_tls: TLS 1.0 Handshake [length 0041], ClientHello And that's it? Unfortunatly, I deleted the log file by mistake. (I really should drink my coffee before typing) I'm attempting to recreate it now. It shouldn't be long, (I hope) - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Hi, I have a follow-up as well. After configuring everything for doing SIGHUPs it turned out that after a SIGHUP, the process sits there and does nothing any more. When I did killall -HUP radiusd in non-debug mode, the process kept running, but didn't process anything any more. When I did it in -X mode, it segfaulted. The end of the -X output is: Sending Access-Accept of id 132 to 158.64.X.Y port 10270 Finished request 328 Going to the next request Cleaning up request 315 ID 35 with timestamp 442404d4 Waking up in 1 seconds... Waking up in 1 seconds... Reloading configuration files. reread_config: reading radiusd.conf Config: including file: /etc/raddb/proxy.conf Config: including file: /etc/raddb/clients.conf Config: including file: /etc/raddb/eap.conf Config: including file: /etc/raddb/sql-normal-vpn.conf Config: including file: /etc/raddb/sql-normal-eduroam.conf Config: including file: /etc/raddb/sql-luxdsl-cops.conf Config: including file: /etc/raddb/sql-luxdsl-erx.conf main: prefix = /usr/local/freeradius main: localstatedir = /var main: logdir = /var/log/radius main: libdir = /usr/local/freeradius/lib main: radacctdir = /var/log/radius/radacct main: hostname_lookups = no main: max_request_time = 30 main: cleanup_delay = 5 main: max_requests = 1024 main: delete_blocked_requests = 0 main: port = 0 main: allow_core_dumps = no main: log_stripped_names = no main: log_file = /var/log/radius/radius.log main: log_auth = yes main: log_auth_badpass = no main: log_auth_goodpass = no main: pidfile = /var/run/radiusd/radiusd.pid main: user = radiusd main: group = radiusd main: usercollide = no main: lower_user = no main: lower_pass = no main: nospace_user = no main: nospace_pass = no main: checkrad = /usr/local/freeradius/sbin/checkrad main: proxy_requests = yes security: max_attributes = 200 security: reject_delay = 1 security: status_server = no main: debug_level = 0 read_config_files: reading dictionary read_config_files: reading naslist Using deprecated naslist file. Support for this will go away soon. read_config_files: reading clients read_config_files: reading realms Segmentation fault HTH, Stefan Winter -- Stefan WINTER RESTENA Foundation - Réseau Téléinformatique de l'Education Nationale et de la Recherche RD Engineer 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg email: [EMAIL PROTECTED] Tel.: +352 424409-1 http://www.restena.lu Fax: +352 422473 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
RE: Version 1.1.1 stops responding
Mine seg faulted as well.. (This time I didn't overwrite the log) rad2:/home/mking# /usr/sbin/freeradius -X -A crash.log Segmentation fault rad2:/home/mking# I don't believe running /usr/sbin/freeradius -X -A is capturing anything useful. Is there something else I can do? Here's the last few lines of the freeradius -X -A rlm_detail: /var/log/freeradius/radacct/%{Client-IP-Address}/auth-detail-%Y%m%d expands to /var/log/freeradius/radacct/10.0.1.32/auth-detail-20060324 modcall[authorize]: module auth_log returns ok for request 1002 modcall[authorize]: module chap returns noop for request 1002 modcall[authorize]: module mschap returns noop for request 1002 rlm_realm: No '@' in User-Name = BSC\ddelutis, looking up realm NULL rlm_realm: No such realm NULL modcall[authorize]: module suffix returns noop for request 1002 rlm_eap: EAP packet type response id 28 length 80 rlm_eap: No EAP Start, assuming it's an on-going EAP conversation modcall[authorize]: module eap returns updated for request 1002 users: Matched entry DEFAULT at line 152 users: Matched entry DEFAULT at line 171 modcall[authorize]: module files returns ok for request 1002 modcall: leaving group authorize (returns updated) for request 1002 rad_check_password: Found Auth-Type EAP auth: type EAP Processing the authenticate section of radiusd.conf modcall: entering group authenticate for request 1002 rlm_eap: Request found, released from the list rlm_eap: EAP/peap rlm_eap: processing type peap rlm_eap_peap: Authenticate rlm_eap_tls: processing TLS rlm_eap_tls: Length Included eaptls_verify returned 11 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Version 1.1.1 stops responding
So I built 1.1.1 on Debian. After a period of so many hours (variable) it stops responding. (Sometimes 2hours, sometimes 16hours) Now here's where it get's weird, (and makes me suspect it might not be freeRADIUS at the root cause) If I stop and restart the freeRADIUS service, it continues to ignore RADIUS packets. But if I restart the server (hard reboot) it works fine. Till it stops responding again. Obviously this is not enough information to help you diagnose the problem. How do I gather that information? The box is a 233 Pentium with 64 megs of ram. Has about 15 AP's, with around 100 users (not simultaneous, maybe 30 simultaneous) So what's the suggested way of gathering more info? Running debug mode piping to a text file? - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
On Thu, 2006-03-23 at 09:24 -0500, King, Michael wrote: So I built 1.1.1 on Debian. After a period of so many hours (variable) it stops responding. (Sometimes 2hours, sometimes 16hours) Now here's where it get's weird, (and makes me suspect it might not be freeRADIUS at the root cause) If I stop and restart the freeRADIUS service, it continues to ignore RADIUS packets. I am seeing a similar problem on RedHat. I originally thought it was only happening when I sent a HUP signal, but it turns out this is not the case. However in my case all I have to do to fix it is restart the service (I do not need to reboot the entire operating system). Ben - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Version 1.1.1 stops responding
Hi, I am seeing a similar problem on RedHat. I originally thought it was only happening when I sent a HUP signal, but it turns out this is not the case. However in my case all I have to do to fix it is restart the service (I do not need to reboot the entire operating system). for the record: this happened to me *once* as well (SuSE 8.2). That coincided with an access point crashing in the middle of an authentication, so I thought it might just be that the AP sent a very weird packet while dieing. Really strange symptoms... radiusd and all its threads are running, and bound to the port they should, but no indication of a received packet. Re-starting the service did the trick for me as well. Greetings, Stefan Winter -- Stefan WINTER Stiftung RESTENA - Réseau Téléinformatique de l'Education Nationale et de la Recherche Ingenieur Forschung Entwicklung 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg E-Mail: [EMAIL PROTECTED] Tel.: +352 424409-1 http://www.restena.lu Fax: +352 422473 - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html