Thank you both. These are both very helpful answers; we have some fires right now; so I can't actually follow up just now.
--- Begin Message ---
On Thu, 18 Jul 2019 at 15:22, Eric W. Bates <[email protected]> wrote:
>
> Here's the scoop. I'm asking because this took me several months to
> notice the problem and diagnose and I'm wondering whether this is a
> logging bug or whether there are additional logging functions I can
> enable. I'm still using an old version of Radiator: 4.15 (so that may be
> my problem).

There is LogIgnore in newer versions of Radiator which helps to log
requests which have been ignored for some reason. There's also
EAPTLS_TraceState, which helps to trace PEAP/TTLS authentication
problems and ResponseTimeThreshold (mentioned here
https://www.slideshare.net/radiatorsoftware/tnc19-radiator-technical-workshop-meet-radiator-developers
), which helps to track how long request have been processed.

> I added a MySQL backup using automysqlbackup from the main Ubuntu
> repository and have been successfully making daily dumps for about 9
> months. By default this runs at 06:25 every morning. The accounting
> tables are quite large (I'm in the process of adding automatic pruning);
> so the dump takes about 20 minutes.

Have you monitored what is the load of the server during the backup?
>From your debug log timestamps it looks like that even very simple
authentications steps have been 2 or more minutes apart from each
other.

Do you run only single Radiator process/instance?

> Occasionally I have received sporadic complaints of authentication
> failures during that time window. When I have investigated, I have found
> no information in my trace level 4 logs that would indicate any database
> issue except for reconnection notices after the backup has completed. In
> addition, there are no packet dumps showing connection attempts; so I
> put the blame on my VPN.

I have a working theory about why this happens, but need a bit more
information from you to verify some of the environment/configuration
things.

What I think happens is that first your database backup starts. Host
and or database becomes unresponsive or slow to respond because of the
load. The database connection however does not timeout or go away.
Radiator gets a request (which looks like PEAP) and tries to query the
database. The query is sent to database, which does not respond or
responds only after significant delay. Radiator as a single thread
application sits there waiting for database to respond, but the
database either does not do that or responds so slowly that the RADIUS
clients, which originally sent the request, timeout.

Why this does not show in your Radiator version logs is probably the
combination of older Radiator version, PEAP and possibly AuthBy
RADMIN. We have increased logging features in the Radiator versions
since 4.15 to catch more easily this kind of errors. The earlier
versions of Radiator did not for example log if PEAP authentication
was interrupted (for example because of certificate problems, client
going away etc.). This might occur in your case so that first some
PEAP authentication is started and because database does not respond,
it is not finished probably because of the MSCHAP-V2
challenge/response cycle. As PEAP does not reach a decision, in older
Radiator, it is not logged to be timeoutted. The database on the other
hand may still have the connection up and Radiator waiting for
response so no error about not able to connect to database is logged
either. To the NAS client Radiator looks like a black hole as it does
not respond to its requests, so you get timeouts.

What kind of command line options are you using on radpwtst probe?
Does it do the PEAP/MSCHAP-V2 request mentioned there in your logs or
something standard RADIUS?

> I have set up Nagios to authenticate (via radpwtest) every 5 minutes and
> it does, in fact, report failures during the SQL backup, but again,
> there is ~0~ evidence in the logs that there is any trouble or even
> attempts to authenticate.
>
> Here's the Nagios log from Wednesday morning:
>    (date -d @1563345993 == Wed Jul 17 02:46:33 EDT 2019)
>
> [1563345993] SERVICE ALERT: radius2;Systems vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346005] SERVICE ALERT: radius2;ACOMMS vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346104] SERVICE ALERT: radius;Systems vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346106] SERVICE ALERT: radius;CUI vpn auth;UNKNOWN;SOFT;1;UNKNOWN
> No reply
> [1563346117] SERVICE ALERT: radius;ACOMMS vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346118] SERVICE ALERT: radius2;Systems vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346127] SERVICE ALERT: radius2;CUI vpn auth;UNKNOWN;SOFT;1;UNKNOWN
> No reply
> [1563346130] SERVICE ALERT: radius2;ACOMMS vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346230] SERVICE ALERT: radius;Systems vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346231] SERVICE ALERT: radius;CUI vpn auth;UNKNOWN;SOFT;2;UNKNOWN
> No reply
> [1563346242] SERVICE ALERT: radius;ACOMMS vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
>
>    (date -d @1563346242 == Wed Jul 17 02:50:42 EDT 2019)
>
> During this time backup interval, the Nagios test simply times out.
> Radiator gives no reply.
>
> At the same time in the level 4 Radiator trace (this log is also EDT):
>
> Wed Jul 17 02:45:39 2019 038925: DEBUG: PEAP Tunnelled request Packet dump:
> Code:       Access-Request
> Identifier: UNDEF
> Authentic:  SB<128>R<145><190>;A'b<10><252>i%X<218>
> Attributes:
>         EAP-Message = <2>`<0><18><1>[email protected]
>         Message-Authenticator = 
> <0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0>
>         NAS-IP-Address = 192.168.0.183
>         NAS-Identifier = "SafetyOffice"
>         NAS-Port = 0
>         Calling-Station-Id = "24-F0-94-D0-E7-DF"
>         User-Name = ""
>
> Wed Jul 17 02:45:39 2019 039238: DEBUG: Handling request with Handler
> 'TunnelledByPEAP = 1, NAS-IP-Address = /^192\.168\.0.*/', Identifier
> 'TunnelledByPeap-IP'
> Wed Jul 17 02:47:41 2019 420670: DEBUG: Handling with
> Radius::AuthRADMIN: CheckRAdminMSCHAPv2
> Wed Jul 17 02:49:41 2019 471263: DEBUG: Handling with
> Radius::AuthRADMIN: CheckRAdminMSCHAPv2
> Wed Jul 17 02:51:41 2019 475543: DEBUG: Handling with EAP: code 2, 96, 18, 1
> Wed Jul 17 02:53:41 2019 537452: DEBUG: Response type 1
> Wed Jul 17 02:55:41 2019 872823: DEBUG: EAP result: 3, EAP MSCHAP-V2
> Challenge
> Wed Jul 17 02:55:42 2019 298580: DEBUG: AuthBy RADMIN result: CHALLENGE,
> EAP MSCHAP-V2 Challenge
> Wed Jul 17 02:55:42 2019 298821: DEBUG: Access challenged for : EAP
> MSCHAP-V2 Challenge

If you look at timestamps there are several minutes between lines even
without accessing the database? Do you have any information about the
load of the system?

> Not only is there no log indicating that the DB has timed out, there is
> no log of the authentication attempts by Nagios. The log is basically
> empty for the time window of the DB backup.
>
> After the backup is complete, there is a log message:

This looks like my working theory may be correct. Not responding
database with PEAP and MSCHAP-V2 jams the Radiator process while it
tries to process the request and during that time Radiator does not
process any other requests, which explain why there are no log entries
either.

> Wed Jul 17 02:55:42 2019 920915: DEBUG: Connecting to
> 'dbi:mysql:radmin:whoiradius-255.whoinet.mydomain.edu'
>
> Is that really the only indication that the DB has gone away?

No, if database goes away so that it cannot be connected, there are
complaints about it and reconnection messages. But if the Radiator
process jams to wait for the database to respond, then there are no
log entries either.

br,

// kh

> On 7/17/19 5:07 AM, Karri Huhtanen wrote:
> > On Tue, 16 Jul 2019 at 14:30, Eric W. Bates <[email protected]> wrote:
> >>
> >> I'm using MySQL as my backend for radiator and I think I'm having SQL
> >> timeouts whenever the database is backed up.
> >>
> >> Does Radiator log SQL errors and/or how do I get those to show up?
> >
> > Radiator only logs SQL errors when it tried to access the database and
> > cannot do that for some reason. These errors are logged in the main
> > Radiator logfile at least with Trace level 3. If you are trying to
> > catch the timeout I think setting up radpwtst to try to authenticate
> > every second might help to generate SQL requests also at the your
> > database is backed up. You can use for example cron or  watch and
> > screen to setup this kind of test. Please ensure that the
> > authentication request is that kind of authentication which makes
> > Radiator to access the database, otherwise this test does not help
> > you.
> >
> > br,
> >
> > // kh
> >
>
>


--- End Message ---
--- Begin Message ---
Hi Eric,

I think you may be running into the issue that automysqlbackup uses
mysqldump underneath the hood.

The default behaviour of mysqldump is to lock all tables within a
database during the dump (to ensure the dump is consistent).

This means that the database is to all intents and purposes "up",
however any queries will simply be blocked by the database engine.  This
fits in with the behaviour you're seeing.

You may be able to pass in an option, or hack the automysqlbackup script
to pass the "--skip-lock-tables" option to mysqldump.  Whether it makes
sense to do this is highly situational and you'd have to be aware that
not locking tables during a dump could end up with inconsistencies in
your backup.

If there are multiple tables in your database, you could perhaps try
setting things up the back up one table at a time, which may help.

However, if your database is going to remain large then I'd suggest
looking at one of the more robust backup options that are able to
support snapshot type backups of the database that don't require table
locks etc during the backup.

We also have a large Oracle database infrastructure in addition to our
MySQL and PostgreSQL database infrastructure, so we leverage Oracl'es
RMAN tool to get the job done for us.

An alternative would be to back up the actual binary database files (a
"physical" backup), rather than using mysqldump (a "logical" backup).
This is a lot more complicated to restore should that eventuality come
to pass, but would get around your locked tables issue.

Hope that helps in some way.

Regards,

Martin.


On 18/07/2019 13:22, Eric W. Bates wrote:
> Here's the scoop. I'm asking because this took me several months to
> notice the problem and diagnose and I'm wondering whether this is a
> logging bug or whether there are additional logging functions I can
> enable. I'm still using an old version of Radiator: 4.15 (so that may be
> my problem).
> 
> 
> I added a MySQL backup using automysqlbackup from the main Ubuntu
> repository and have been successfully making daily dumps for about 9
> months. By default this runs at 06:25 every morning. The accounting
> tables are quite large (I'm in the process of adding automatic pruning);
> so the dump takes about 20 minutes.
> 
> Occasionally I have received sporadic complaints of authentication
> failures during that time window. When I have investigated, I have found
> no information in my trace level 4 logs that would indicate any database
> issue except for reconnection notices after the backup has completed. In
> addition, there are no packet dumps showing connection attempts; so I
> put the blame on my VPN.
> 
> I have set up Nagios to authenticate (via radpwtest) every 5 minutes and
> it does, in fact, report failures during the SQL backup, but again,
> there is ~0~ evidence in the logs that there is any trouble or even
> attempts to authenticate.
> 
> Here's the Nagios log from Wednesday morning:
>   (date -d @1563345993 == Wed Jul 17 02:46:33 EDT 2019)
> 
> [1563345993] SERVICE ALERT: radius2;Systems vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346005] SERVICE ALERT: radius2;ACOMMS vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346104] SERVICE ALERT: radius;Systems vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346106] SERVICE ALERT: radius;CUI vpn auth;UNKNOWN;SOFT;1;UNKNOWN
> No reply
> [1563346117] SERVICE ALERT: radius;ACOMMS vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346118] SERVICE ALERT: radius2;Systems vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346127] SERVICE ALERT: radius2;CUI vpn auth;UNKNOWN;SOFT;1;UNKNOWN
> No reply
> [1563346130] SERVICE ALERT: radius2;ACOMMS vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346230] SERVICE ALERT: radius;Systems vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346231] SERVICE ALERT: radius;CUI vpn auth;UNKNOWN;SOFT;2;UNKNOWN
> No reply
> [1563346242] SERVICE ALERT: radius;ACOMMS vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> 
>   (date -d @1563346242 == Wed Jul 17 02:50:42 EDT 2019)
> 
> During this time backup interval, the Nagios test simply times out.
> Radiator gives no reply.
> 
> At the same time in the level 4 Radiator trace (this log is also EDT):
> 
> Wed Jul 17 02:45:39 2019 038925: DEBUG: PEAP Tunnelled request Packet dump:
> Code:       Access-Request
> Identifier: UNDEF
> Authentic:  SB<128>R<145><190>;A'b<10><252>i%X<218>
> Attributes:
>     EAP-Message = <2>`<0><18><1>[email protected]
>     Message-Authenticator =
> <0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0>
>     NAS-IP-Address = 192.168.0.183
>     NAS-Identifier = "SafetyOffice"
>     NAS-Port = 0
>     Calling-Station-Id = "24-F0-94-D0-E7-DF"
>     User-Name = ""
> 
> Wed Jul 17 02:45:39 2019 039238: DEBUG: Handling request with Handler
> 'TunnelledByPEAP = 1, NAS-IP-Address = /^192\.168\.0.*/', Identifier
> 'TunnelledByPeap-IP'
> Wed Jul 17 02:47:41 2019 420670: DEBUG: Handling with
> Radius::AuthRADMIN: CheckRAdminMSCHAPv2
> Wed Jul 17 02:49:41 2019 471263: DEBUG: Handling with
> Radius::AuthRADMIN: CheckRAdminMSCHAPv2
> Wed Jul 17 02:51:41 2019 475543: DEBUG: Handling with EAP: code 2, 96,
> 18, 1
> Wed Jul 17 02:53:41 2019 537452: DEBUG: Response type 1
> Wed Jul 17 02:55:41 2019 872823: DEBUG: EAP result: 3, EAP MSCHAP-V2
> Challenge
> Wed Jul 17 02:55:42 2019 298580: DEBUG: AuthBy RADMIN result: CHALLENGE,
> EAP MSCHAP-V2 Challenge
> Wed Jul 17 02:55:42 2019 298821: DEBUG: Access challenged for : EAP
> MSCHAP-V2 Challenge
> 
> Not only is there no log indicating that the DB has timed out, there is
> no log of the authentication attempts by Nagios. The log is basically
> empty for the time window of the DB backup.
> 
> After the backup is complete, there is a log message:
> 
> Wed Jul 17 02:55:42 2019 920915: DEBUG: Connecting to
> 'dbi:mysql:radmin:whoiradius-255.whoinet.mydomain.edu'
> 
> 
> Is that really the only indication that the DB has gone away?
> 
> 
> On 7/17/19 5:07 AM, Karri Huhtanen wrote:
>> On Tue, 16 Jul 2019 at 14:30, Eric W. Bates <[email protected]> wrote:
>>>
>>> I'm using MySQL as my backend for radiator and I think I'm having SQL
>>> timeouts whenever the database is backed up.
>>>
>>> Does Radiator log SQL errors and/or how do I get those to show up?
>>
>> Radiator only logs SQL errors when it tried to access the database and
>> cannot do that for some reason. These errors are logged in the main
>> Radiator logfile at least with Trace level 3. If you are trying to
>> catch the timeout I think setting up radpwtst to try to authenticate
>> every second might help to generate SQL requests also at the your
>> database is backed up. You can use for example cron or  watch and
>> screen to setup this kind of test. Please ensure that the
>> authentication request is that kind of authentication which makes
>> Radiator to access the database, otherwise this test does not help
>> you.
>>
>> br,
>>
>> // kh
>>
> 
> 
> 
> _______________________________________________
> radiator mailing list
> [email protected]
> https://lists.open.com.au/mailman/listinfo/radiator
> 

-- 
Martin Burton
Principal Systems Administrator            \\\|||///
Infrastructure Team                       \\  ^ ^  //
Wellcome Sanger Institute                  (  6 6  )
-----------------------------------------oOOo-(_)-oOOo---
t: +44 (0)1223 496945             http://www.sanger.ac.uk
Extreme Networks Specialist:      a1780000003uG1BAAU

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
radiator mailing list
[email protected]
https://lists.open.com.au/mailman/listinfo/radiator

--- End Message ---

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
radiator mailing list
[email protected]
https://lists.open.com.au/mailman/listinfo/radiator

Reply via email to