Re: (RADIATOR) Radiator going down after Oracle SQL Timeout
Hi Hugh, well... time passed and this happened again but, alas, there is no message whatsoever that indicates what happened. I kept using supervise which, instead of mailing me, is sending all of standard output + standard error through a logger (multilog) which timestamps it and writes it to a file. Here's Radiator's log of the moment of the problem: ===START OF RADIATOR LOG=== Wed Jan 30 01:21:44 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 535302554, snmp-community Wed Jan 30 01:22:15 2002: DEBUG: Packet dump: *** Received from 10.133.56.33 port 1645 Code: Access-Request Identifier: 124 Authentic: <236>0<179><185>W@"-)A4<194>r?w- Attributes: NAS-IP-Address = 10.133.56.33 NAS-Port = 30 NAS-Port-Type = Async User-Name = "uncliente@pbm" Called-Station-Id = "0380" Calling-Station-Id = "1141399338" User-Password = "<<247>:<232><232>d<224><135>@<255>`QRs.<218>" Service-Type = Framed-User Framed-Protocol = PPP Wed Jan 30 01:22:15 2002: DEBUG: Rewrote user name to uncliente@pbm Wed Jan 30 01:22:15 2002: DEBUG: Check if Handler Pert-PreRegUser-Flag = 1, Request-Type = Access-Request should be used to handle this request Wed Jan 30 01:22:15 2002: DEBUG: Check if Handler Request-Type = Access-Request should be used to handle this request Wed Jan 30 01:22:15 2002: DEBUG: Handling request with Handler 'Request-Type = Access-Request' Wed Jan 30 01:22:15 2002: DEBUG: Rewrote user name to uncliente@pbm Wed Jan 30 01:22:15 2002: DEBUG: SessDBUsers Deleting session for uncliente@pbm, 10.133.56.33, 30 Wed Jan 30 01:22:15 2002: DEBUG: do query is: DELETE FROM USUARIOS_EN_LINEA WHERE USUA_IP_NAS='10.133.56.33' AND USUA_PORT=030 Wed Jan 30 01:22:21 2002: DEBUG: Handling with Radius::AuthSQL Wed Jan 30 01:22:21 2002: DEBUG: Handling with Radius::AuthSQL:UserGetPassword Wed Jan 30 01:22:21 2002: DEBUG: Query is: SELECT U.USU_CLAVE, S.SER_CODIGO, S.SER_MAX_SESSION_CONCURRENTES, S.TIMEFRAMEID, S.SER_GEN_CHECK, S.SER_GEN_REPLY, U.USU_IP_NRO_FIJA, U.USU_IP_MASC_FIJA, U.USU_TIEMPO_RESTANTE, U.USU_BYTES_RESTANTES, U.USU_SUSPENDIDO, U.USU_GEN_CHECK, U.USU_GEN_REPLY, VS.VISP_SER_VALID_DNIS FROM USUARIOS U, VISP V, SERVICIOS S, VISP_SERVICIOS VS WHERE U.VISP_CODIGO = V.VISP_CODIGO AND U.SER_CODIGO = S.SER_CODIGO AND U.USU_CODIGO = 'uncliente' AND U.VISP_CODIGO = 'pbm' AND V.VISP_CODIGO = VS.VISP_CODIGO AND S.SER_CODIGO = VS.SER_CODIGO AND '0380' LIKE VS.VISP_SER_VALID_DNIS Wed Jan 30 01:22:37 2002: DEBUG: Radius::AuthSQL looks for match with uncliente@pbm Wed Jan 30 01:22:37 2002: DEBUG: Query is: SELECT USUA_IP_NAS, USUA_PORT, USUA_SESION_ID FROM USUARIOS_EN_LINEA WHERE USU_CODIGO ='uncliente' AND VISP_CODIGO='pbm' Wed Jan 30 01:23:16 2002: DEBUG: Radius::AuthSQL ACCEPT: Wed Jan 30 01:23:16 2002: DEBUG: Handling with PORTLIMITCHECK Wed Jan 30 01:23:16 2002: DEBUG: Query is: SELECT COUNT(*) FROM USUARIOS_EN_LINEA WHERE VISP_CODIGO = 'pbm' AND SER_CODIGO = 'Teletrabajo_PBM' AND '0380' LIKE VISP_SER_VALID_DNIS Wed Jan 30 01:24:16 2002: ERR: Execute failed for 'SELECT COUNT(*) FROM USUARIOS_EN_LINEA WHERE VISP_CODIGO = 'pbm' AND SER_CODIGO = 'Teletrabajo_PBM' AND '0380' LIKE VISP_SER_VALID_DNIS': SQL Timeout HERE'S WHERE RADIATOR DIED Wed Jan 30 01:25:58 2002: ERR: Could not connect to SQL database with DBI->connect dbi:Oracle:host=db;sid=RADP, oraUser, oraPassword: timeout at /usr/local/lib/perl5/site_perl/5.6.1/Radius/Util.pm line 507, line 20. Wed Jan 30 01:25:58 2002: ERR: Could not connect to any SQL database. Request is ignored. Backing off for 0 seconds Wed Jan 30 01:25:58 2002: DEBUG: Reclaiming expired leases Wed Jan 30 01:25:58 2002: DEBUG: do query is: UPDATE POOL_IP SET OCUPADA = 0, TIME_STAMP = 1012364758 WHERE OCUPADA != 0 AND EXPIRA < 1012364758 Wed Jan 30 01:26:53 2002: WARNING: Unknown service name Wed Jan 30 01:26:53 2002: INFO: Server started: Radiator 2.18.4 on radius1 Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 913637970, snmp-community Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 913637970, snmp-community Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 913637970, snmp-community Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 913637970, snmp-community Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 913637970, snmp-community Wed Jan 30 01:31:43 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 243092869, snmp-community Wed Jan 30 01:31:45 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 243092869, snmp-community Wed Jan 30 01:36:43 2002: DEBUG: SNMPAgent: received request from 192.168.1.2, 128, 261688663, snmp-community Wed Jan 3
Re: (RADIATOR) Radiator going down after Oracle SQL Timeout
Hi John, Thanx for your excellent advise... the point with this specific setup is that: 1) I don't have access to the oracle server (we are contractors and only have access to the Radius servers "on demand" and I don't even have a unix login on the Oracle server). 2) The authentication process is far more complicated that "username/password + a bunch of attributes" and involves port limit checks, dynamic services (where some reply-attributes are bound to a service, not to a user, and some others are inferred from the servicem, not the user), dynamic IP address pools and the like. 3) Since the service is "re-sold" by my client to serveral other customers, they want to enforce all of this dynamic checks. Oracle falls down relatively rarely and performance is, at least, quite adequate (when it's working, obviously). I shortened the back-off time to one minute (they have a kind of "cold/warm oracle stand-by", so, when it goes down, it comes up back quite fast. My question was oriented to why Radiator was falling down in those cases (rather than simply backing off and not authenticating for a while). As the time it took for me to get a login to the servers was rather long ('cause my customer didn't think it was critical, since the service WAS up all of the time) I couldn't see the "supervise" logs (because they are autorotated and autoerased automatically). This logs capture all of the standard output and standard error from Radiator timestamping it. I increased the size and number of logs, so I will be able to see them the next time it happens (I hope). El 15 Dec 2001 a las 18:43, John Coy escribió: > > > > Hello Mariano - > > > > What you describe below sounds to me like a problem with the DBD-Oracle > > module. I would suggest that you try to use the "restartWrapper" > > program that we provide in the distribution ("goodies/restartWrapper") > > instead of "supervise" (at least for debugging this problem). The > > restartWrapper program can be set up with a delay before restarting, > > and it can also be configured to email a designated email address with > > the exit status and any error messages that were written to stderr. We > > should then be able to see what is causing Radiator to die. > > > > regards > > > > Hugh's answer is a good one, although you mentioned you're already using a > daemon-restart tool. > > I'm replying because I am running about the same configuration you are -- > two RADIUS daemons (one for auth the other for accounting) and using an > Oracle database for the back-end. Here's my approach for handling the > database being offline. > > 1) Check to see why your database is offline =) If it's going down often > enough you might want to spend some time troubleshooting that. (ok, so > that's obvious, but I thought I'd state it anyhow just in case :-) > > 2) Export your Oracle password file to a flat-file equivalent on a regular > basis. My export simulates a traditional UNIX shadow file. > > This will allow you to use the to chain together a couple of > clauses. The first clause will query your SQL server, the second > will query the flat file if the SQL server is unavailable. An example cut > out of my "auth.cfg" file: > > # > # The statement allows me to bundle the SQL > # authentication with the UNIX-style authentication in case the > # SQL server is down. SQL authentication is preferred and takes > # precedence. > # > > Identifier ANCI-AuthSQLorUNIXPasswd > AuthByPolicyContinueWhileIgnore > > AuthBy ANCI-AuthSQLPasswd > AuthBy UNIX > > > > Identifier ANCI-AuthSQLPasswd > > ... all your SQL auth stuff here ... > > > > Identifier UNIX > Filename/usr/local/etc/shadow > GroupFilename /usr/local/etc/group > > > This gives you a bit of redundancy in case your Oracle database goes > offline. Be sure your export routine does not clobber the file with empty > data if it cannot read from the database (or else you're back where you > started). > > Hope that helps. > > John > Arkansas.Net -- Mariano Absatz El Baby -- Nobody has ever, ever, EVER learned all of WordPerfect. === Archive at http://www.open.com.au/archives/radiator/ Announcements on [EMAIL PROTECTED] To unsubscribe, email '[EMAIL PROTECTED]' with 'unsubscribe radiator' in the body of the message.
RE: (RADIATOR) Radiator going down after Oracle SQL Timeout
The point is when you add one more server to the "back farm", why you do it? Because you can't process enough radius requests? or because the servers themselves are overloaded? The most important factor is, usually, the database. What are you using? SQL? DBM? LDAP? /etc/passwd? Where does it reside? A happy new year to you too! El 14 Dec 2001 a las 14:23, Harrison Ng escribió: > Hello Mariano, > > Your radiator splitting method sounds interesting to us. > We try to do it in test plant, and get experience from it. > > What we are doing here is using two radiator to proxy > auth/acct request to a bunch of radiator server. > We add more radiator server to the bunch for scaling. > Eventually we have to manage too many linux boxes. > This cost us much administrative overhead and money > for maintenance. > > Merry Christmas and Happy New Year. > > Regards, > Harrison > > > -Original Message- > From: Mariano Absatz [ mailto:[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> ] > Sent: Thursday, December 13, 2001 9:28 PM > To: Harrison Ng > Cc: Radiator List > Subject: RE: (RADIATOR) Radiator going down after Oracle SQL Timeout > > > Well, > > I think this was discussed quite a few times in the list and was > recommended > by Hugh. > > The point is, precisely, the "single-thread-ness" of Radiator (inherited > from > the still unstablesness of Perl's multi-threading). > > While Radiator IS really fast, the data bases it interfaces are not > necessarily fast (nor available, as the problem I had shows). > > In my case, I'm using an oracle database to authenticate users and also > to > store accounting records and on-line users. For now, these all reside in > the > same database in the same host (not the same host that is running > Radiator), > but I designed it so it can scale and functionally divide the databases. > > > But even being in the same host, by splitting up Radiator authentication > and > accounting processes the database delays querying the tables to > authenticate > don't stop Radiator's accounting from receiving and storing account > records > and maintain the on-line users table and vice-versa. > > If I detected that the process is still to slow and the culprit was the > database, I might even be tempted to leave 2 radiator instances > listening on > the standard ports for authentication and accounting records and load- > balancing them among a bunch of authentication and accounting radiator > processes all running on non-standard ports on the same host. > > El 13 Dec 2001 a las 10:48, Harrison Ng escribió: > > > Hello Mariano, > > > > Do you mind telling me the purpose of running > > two instances of Radiator on the same unix box. > > > > I've heard that Radiator is a single thread perl appplication. > > So it can't fully utilize system resource effectively. > > > > Harrison > > SmarTone BroadBand Services Ltd. > > > > > > > > -Original Message----- > > From: [EMAIL PROTECTED] [ mailto:[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> > > < mailto:[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> > ]On > > Behalf Of Hugh Irvine > > Sent: Thursday, December 13, 2001 9:14 AM > > To: Mariano Absatz; Radiator List > > Subject: Re: (RADIATOR) Radiator going down after Oracle SQL Timeout > > > > > > > > Hello Mariano - > > > > What you describe below sounds to me like a problem with the > DBD-Oracle > > module. I would suggest that you try to use the "restartWrapper" > program > > that > > we provide in the distribution ("goodies/restartWrapper") instead of > > "supervise" (at least for debugging this problem). The restartWrapper > > program > > can be set up with a delay before restarting, and it can also be > > configured > > to email a designated email address with the exit status and any error > > > messages that were written to stderr. We should then be able to see > what > > is > > causing Radiator to die. > > > > regards > > > > Hugh > > > > > > On Thu, 13 Dec 2001 08:14, Mariano Absatz wrote: > > > Hi, > > > > > > I'm having the following problem: > > > > > > I'm using Radiator (2.18.4) and have all of my data on a remote > Oracle > > > > > (8.1.6) server. > > > > > > Both machines are Sun Netra with Solaris 8. Pe
Re: (RADIATOR) Radiator going down after Oracle SQL Timeout
I'm logging Radiator's stdout/stderr, but the logger rotated and erased it by the time I got access to the host (it's a customer's host and when there's nothing 'broken' I don't get access too fast). I increased the log size & number and the next time it happens, I'll send them. El 13 Dec 2001 a las 12:13, Hugh Irvine escribió: > > Hello Mariano - > > What you describe below sounds to me like a problem with the DBD-Oracle > module. I would suggest that you try to use the "restartWrapper" program that > we provide in the distribution ("goodies/restartWrapper") instead of > "supervise" (at least for debugging this problem). The restartWrapper program > can be set up with a delay before restarting, and it can also be configured > to email a designated email address with the exit status and any error > messages that were written to stderr. We should then be able to see what is > causing Radiator to die. > > regards > > Hugh > > > On Thu, 13 Dec 2001 08:14, Mariano Absatz wrote: > > Hi, > > > > I'm having the following problem: > > > > I'm using Radiator (2.18.4) and have all of my data on a remote Oracle > > (8.1.6) server. > > > > Both machines are Sun Netra with Solaris 8. Perl version is 5.6.1. > > > > There are two instances of Radiator (one for authentication and the other > > for accounting). > > > > The problem is the following. If the Oracle server goes down, the queries > > time out (that's reasonable). The point is some times (not after every SQL > > timeout, but after some of them), Radiator goes down. It seems to be that > > this happens when the query in question is necessary as part of the > > authentication (e.g. during a username lookup or simultaneous use or port > > limit check), but not when it is nonessential (as a deletion from the > > radonline table for the nas/port recently received or an insertion in an > > AuthLog). > > > > On only one ocassion I saw the "Could not connect to any SQL database. > > Request is ignored. Backing off for 600 second" message, but even that > > time, Radiator went down. > > > > I'm using daemontool's supervise (http://cr.yp.to/daemontools.html) to keep > > the servers running so the server starts up again almost immediately. I see > > the messages when it is starting again in the log. > > > > The question is, why is Radiator silently shutting down rather than backing > > off? > > > > One of the main problems is that on the almost immediate restart, the first > > thing Radiator tries to do is to read the client list from the database. If > > Oracle is still down, it won't read it, it won't retry, and (since there > > are no hardwired 's in the config file, it won't accept anything > > from any NAS. > > > > Regretfully, supervise's log is autorotated and autoerased on a size basis > > and I don't have the output to correlate with Radiator's log. > > > > I'm attaching parts of the logs showing the SQL Timeout error immediately > > followed by Radiator starting up again (via supervise). > > > > The "DEBUG: Adding Clients from SQL database" is the first message issued > > by a NEW Radiator starting. > > > > I'm also attaching the whole set of configuration files (the main one is > > radius-main.cfg) in a zip file. > > -- > Radiator: the most portable, flexible and configurable RADIUS server > anywhere. Available on *NIX, *BSD, Windows 95/98/2000, NT, MacOS X. > - > Nets: internetwork inventory and management - graphical, extensible, > flexible with hardware, software, platform and database independence. -- Mariano Absatz El Baby -- A woman's favorite position is CEO. === Archive at http://www.open.com.au/archives/radiator/ Announcements on [EMAIL PROTECTED] To unsubscribe, email '[EMAIL PROTECTED]' with 'unsubscribe radiator' in the body of the message.
RE: (RADIATOR) Radiator going down after Oracle SQL Timeout
Well, I think this was discussed quite a few times in the list and was recommended by Hugh. The point is, precisely, the "single-thread-ness" of Radiator (inherited from the still unstablesness of Perl's multi-threading). While Radiator IS really fast, the data bases it interfaces are not necessarily fast (nor available, as the problem I had shows). In my case, I'm using an oracle database to authenticate users and also to store accounting records and on-line users. For now, these all reside in the same database in the same host (not the same host that is running Radiator), but I designed it so it can scale and functionally divide the databases. But even being in the same host, by splitting up Radiator authentication and accounting processes the database delays querying the tables to authenticate don't stop Radiator's accounting from receiving and storing account records and maintain the on-line users table and vice-versa. If I detected that the process is still to slow and the culprit was the database, I might even be tempted to leave 2 radiator instances listening on the standard ports for authentication and accounting records and load- balancing them among a bunch of authentication and accounting radiator processes all running on non-standard ports on the same host. El 13 Dec 2001 a las 10:48, Harrison Ng escribió: > Hello Mariano, > > Do you mind telling me the purpose of running > two instances of Radiator on the same unix box. > > I've heard that Radiator is a single thread perl appplication. > So it can't fully utilize system resource effectively. > > Harrison > SmarTone BroadBand Services Ltd. > > > > -Original Message- > From: [EMAIL PROTECTED] [ mailto:[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> ]On > Behalf Of Hugh Irvine > Sent: Thursday, December 13, 2001 9:14 AM > To: Mariano Absatz; Radiator List > Subject: Re: (RADIATOR) Radiator going down after Oracle SQL Timeout > > > > Hello Mariano - > > What you describe below sounds to me like a problem with the DBD-Oracle > module. I would suggest that you try to use the "restartWrapper" program > that > we provide in the distribution ("goodies/restartWrapper") instead of > "supervise" (at least for debugging this problem). The restartWrapper > program > can be set up with a delay before restarting, and it can also be > configured > to email a designated email address with the exit status and any error > messages that were written to stderr. We should then be able to see what > is > causing Radiator to die. > > regards > > Hugh > > > On Thu, 13 Dec 2001 08:14, Mariano Absatz wrote: > > Hi, > > > > I'm having the following problem: > > > > I'm using Radiator (2.18.4) and have all of my data on a remote Oracle > > > (8.1.6) server. > > > > Both machines are Sun Netra with Solaris 8. Perl version is 5.6.1. > > > > There are two instances of Radiator (one for authentication and the > other > > for accounting). > > > > The problem is the following. If the Oracle server goes down, the > queries > > time out (that's reasonable). The point is some times (not after every > SQL > > timeout, but after some of them), Radiator goes down. It seems to be > that > > this happens when the query in question is necessary as part of the > > authentication (e.g. during a username lookup or simultaneous use or > port > > limit check), but not when it is nonessential (as a deletion from the > > radonline table for the nas/port recently received or an insertion in > an > > AuthLog). > > > > On only one ocassion I saw the "Could not connect to any SQL database. > > > Request is ignored. Backing off for 600 second" message, but even that > > > time, Radiator went down. > > > > I'm using daemontool's supervise ( http://cr.yp.to/daemontools.html > <http://cr.yp.to/daemontools.html> ) to keep > > the servers running so the server starts up again almost immediately. > I see > > the messages when it is starting again in the log. > > > > The question is, why is Radiator silently shutting down rather than > backing > > off? > > > > One of the main problems is that on the almost immediate restart, the > first > > thing Radiator tries to do is to read the client list from the > database. If > > Oracle is still down, it won't read it, it won't retry, and (since > there > > are no hardwired 's in the config file, it won't accept > anything > > from a
Re: (RADIATOR) Radiator going down after Oracle SQL Timeout
Hello Mariano - What you describe below sounds to me like a problem with the DBD-Oracle module. I would suggest that you try to use the "restartWrapper" program that we provide in the distribution ("goodies/restartWrapper") instead of "supervise" (at least for debugging this problem). The restartWrapper program can be set up with a delay before restarting, and it can also be configured to email a designated email address with the exit status and any error messages that were written to stderr. We should then be able to see what is causing Radiator to die. regards Hugh On Thu, 13 Dec 2001 08:14, Mariano Absatz wrote: > Hi, > > I'm having the following problem: > > I'm using Radiator (2.18.4) and have all of my data on a remote Oracle > (8.1.6) server. > > Both machines are Sun Netra with Solaris 8. Perl version is 5.6.1. > > There are two instances of Radiator (one for authentication and the other > for accounting). > > The problem is the following. If the Oracle server goes down, the queries > time out (that's reasonable). The point is some times (not after every SQL > timeout, but after some of them), Radiator goes down. It seems to be that > this happens when the query in question is necessary as part of the > authentication (e.g. during a username lookup or simultaneous use or port > limit check), but not when it is nonessential (as a deletion from the > radonline table for the nas/port recently received or an insertion in an > AuthLog). > > On only one ocassion I saw the "Could not connect to any SQL database. > Request is ignored. Backing off for 600 second" message, but even that > time, Radiator went down. > > I'm using daemontool's supervise (http://cr.yp.to/daemontools.html) to keep > the servers running so the server starts up again almost immediately. I see > the messages when it is starting again in the log. > > The question is, why is Radiator silently shutting down rather than backing > off? > > One of the main problems is that on the almost immediate restart, the first > thing Radiator tries to do is to read the client list from the database. If > Oracle is still down, it won't read it, it won't retry, and (since there > are no hardwired 's in the config file, it won't accept anything > from any NAS. > > Regretfully, supervise's log is autorotated and autoerased on a size basis > and I don't have the output to correlate with Radiator's log. > > I'm attaching parts of the logs showing the SQL Timeout error immediately > followed by Radiator starting up again (via supervise). > > The "DEBUG: Adding Clients from SQL database" is the first message issued > by a NEW Radiator starting. > > I'm also attaching the whole set of configuration files (the main one is > radius-main.cfg) in a zip file. -- Radiator: the most portable, flexible and configurable RADIUS server anywhere. Available on *NIX, *BSD, Windows 95/98/2000, NT, MacOS X. - Nets: internetwork inventory and management - graphical, extensible, flexible with hardware, software, platform and database independence. === Archive at http://www.open.com.au/archives/radiator/ Announcements on [EMAIL PROTECTED] To unsubscribe, email '[EMAIL PROTECTED]' with 'unsubscribe radiator' in the body of the message.