(RADIATOR) AuthBy LDAP and AuthBy LDAP2: Radaitor hang if fallback LDAP server is down

Holger Meyer Thu, 15 Feb 2001 02:36:19 -0800
Hello folks,

we run into a problem with Radiator-2.17.1 which I feel might be a problem
in its AuthBy LDAP and AuthBy LDAP2 modules if it tries to connect to a
remote LDAP server which is not available.

Problem:
========
Radiator hangs for about 4 minutes if it tries to create a LDAP socket to a
second remote LDAP server specified out of a set of two LDAP servers defined
in a handler section of its radius.cfg file, if that LDAP server is not
available on the network. Also, Radiator issues a second try to resolve the
failed request by means of a reconnect to the LDAP server which is down. It
does not issue a REJECT or such immediately after a connection attempt to
the second LDAP server was made and failed. Instead, it tries to resolve the
failed client request a second time against the same, still not reachable,
LDAP server by connecting to that server again.


System description:
===================
Radiator-2.17.1 installed with perl-ldap-0.22.
Radiator is running on Server A. Server A runs a LDAPv3 compliant directory
service also, which is the Slave directory of a LDAP Master directory on
Server B. Replication of user data occurs between the two LDAP directories
every 5 minutes, from the Master to the Slave.
Server B is in fact a high availability cluster system, set up of two
identical physical servers, forming a virtual server, i.e. server B.
Both server A and server B are running Sun Solaris 2.6.

radius.cfg contains handler sections of the following format (server A's
IP-address is xx.xx.xx.xxx, server B's yy.yy.yy.yyy):

---- SNIPP ------------------------
# v.*-out User
<Handler User-Name = /^v.*-out$/i>
        AuthByPolicy ContinueUntilAccept
        <AuthBy LDAP2>
                Host            xx.xx.xx.xxx
                Port            389
                AuthDN          cn=Manager1, o=Company, c=DE
                AuthPassword    xxxxx
                BaseDN          cn=Router, cn=Application Services, o=Company, c=de
                UsernameAttr    commonName
                PasswordAttr    cdsDEradiatorPassword1
                ReplyAttr       cdsDEradiatorConfiguration1
        </AuthBy>

        <AuthBy LDAP2>
                Host            yy.yy.yy.yyy
                Port            389
                AuthDN          cn=Manager, c=de
                AuthPassword    yyyyy
                BaseDN          cn=Router, cn=Application Services, o=Company, c=de
                UsernameAttr    commonName
                PasswordAttr    cdsDEradiatorPassword1
                ReplyAttr       cdsDEradiatorConfiguration1
        </AuthBy>

        # Log accounting to the detail file in LogDir
        AcctLogFileName /etc/raddb/detail

</Handler>
---- SNIPP END ------------------------

Problem report:
===============
Client requests are first resolved by querying server A (IP address
xx.xx.xx.xxx), which is the first server specified in the radius.cfg handler
section. If the request is denied, because of a bad password or such, the
request is send to server B (Ip address yy.yy.yy.yyy), the second LDAP
server configured in the handler section, since AuthByPolicy
ContinueUntilAccept is set. This AuthByPolicy is used because server A just
contains a replicated shadow copy of the LDAP database while server B keeps
the Master records. The replication occurs every 5 minutes. So it might be
that server A has still no knowledge of a user or changed user password
while the Master directory on server B has more recent data. In order to
test for such fresh data on the Master directory, the mentioned AuthByPolicy
is used. Also, Radiator needs to fallback to server B (the second in the
handler), if the LDAP service on server A is not available, because of some
kind of failure.
With the radius.cfg handler setup as given above, the switch from server A
to server B in case of a failed LDAP service on server A works fine. In such
a case Radiator first tries to resolve requests by calling server A's LDAP
database, which fails, and proceeds by calling server B's service next,
which will answer the query.
But, if server A is up and running and server B is down and a request for a
user is answered by the LDAP service on server A with a result of
"REJECTED", because of a bad password or an unknown user, Radiator tries to
call the LDAP service on server B to check whether the database of that
server might resolve the request successfully. At this point, Radiator runs
into a 2 minute timeout while it tries to initiate a new LDAP session object
for the connection to the not available server B. The 2 minute timeout is
the default timeout set in Net::LDAP for attempts to create a socket
connection to a LDAP server. During the 2 minute period Radiator does not do
anything and just hangs. Since the Radiator clients are configured to a
timeout of 5 seconds, the clients disconnect before the 2 minute timeout is
over. The related code section in AuthBy LDAP2 is the "new Net::LDAP()" call
in the following code of the "reconnect" function:

----- SNIPP -------------------------
sub reconnect
{
    my ($self) = @_;

    # Some LDAP servers (notably imail) disconnect us after an unbind
    # so we see if we are still connected now
    if ($self->{ld} && !getpeername($self->{ld}->{net_ldap_socket}))
    {
  close($self->{ld}->{net_ldap_socket});
  $self->{ld} = undef;
    }

    return 1 if $self->{ld}; # We are already connected

    my $result;
    my $host = &Radius::Util::format_special($self->{Host});
    $self->log($main::LOG_DEBUG, "Connecting to $host, port $self->{Port}");
    if (!($self->{ld} = new Net::LDAP
   ($host,
    port => Radius::Radius::get_port($self->{Port}))))
    {
  $self->log($main::LOG_ERR,
       "Could not open LDAP connection to $host, port $self->{Port}");
  return 0;
    }
    $self->{ld}->debug($self->{Debug}) if $self->{Debug};

---- SNIPP END ------------------------


In Net::LDAP one can find:

------SNIPP of Net::LDAP---------------
sub new {
  my $self = shift;
  my $type = ref($self) || $self;
  my $host = shift if @_ % 2;
  my $arg  = &_options;
  my $obj  = bless {}, $type;

  my $sock = IO::Socket::INET->new(
               PeerAddr => $host,
               PeerPort => $arg->{port} || '389',
               Proto    => 'tcp',
               Timeout  => defined $arg->{timeout}
                             ? $arg->{timeout}
                             : 120
             ) or return;

  $sock->autoflush(1);

  $obj->{net_ldap_socket}  = $sock;
  $obj->{net_ldap_host}    = $host;
  $obj->{net_ldap_resp}    = {};
  $obj->{net_ldap_version} = $arg->{version} || $LDAP_VERSION;
  $obj->{net_ldap_async}   = $arg->{async} ? 1 : 0;

  if (defined(my $onerr = $arg->{onerror})) {
    $onerr = $onerror{$onerr} if exists $onerror{$onerr};
    $obj->{net_ldap_onerror} = $onerr;
  }

  $obj->debug($arg->{debug} || 0 );

  $obj;
}
----- SNIPP END --------------

The "Timeout" value is set to 120 seconds, since AuthBy LDAP does not set
another value, which is exactly the timeout Radiator is running into.

The next problem is that Radiator tries to connect to server B in such a
case a second time for the same client request again. It does not get back
to server A to serve this following request but still tries to resolve it by
calling server B, even since the first attempt has already failed and server
A is the first server configured in the radius.cfg file. I assume that's
because the code section of AuthBy LDAP2

---- SNIPP  -------------------
if (!($self->{ld} = new Net::LDAP
   ($host,
    port => Radius::Radius::get_port($self->{Port}))))
    {
  $self->log($main::LOG_ERR,
       "Could not open LDAP connection to $host, port $self->{Port}");
  return 0;
    }
---- SNIPP END -----------------

returns 0 in case of a failure but the routine "reconnect" is called with
the same hostname for the same requests one more time again. Which leads to
the same error and the same 2 minute timeout again, resulting in an overall
timeout of 4 minutes.

Following is an excerpt of a Trace 4 of the logfile during such an error
condition. The given Ip address of yy.yy.yy.yyy is still the one of server
B. It can be seen that an attempt is made to resolve the request by server A
first, resulting in a REJECT because of a bad password. Next, server B is
tried, which fails after a 2 minute timeout. The connection attempt to
server B is made twice, each leading to the 2 minute timeout. Thus, the
handling of this single request takes 4 minutes. Radiator seems not to be
able to handle other client request during this time.


---- SNIPP  -----------------------
Wed Feb 14 22:44:47 2001: DEBUG: Packet dump:
*** Received from zz.zz.zz.z port 1645 ....
Code:       Access-Request
Identifier: 190
Authentic:  zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Attributes:
        NAS-IP-Address = zz.zz.zzz.z
        NAS-Port-Type = Async
        User-Name = "vd0153216-out"
        Calling-Station-Id = "Dial out"
        User-Password = zzzzzzzzzzzzzzzzzzzzzzzzzz
        Service-Type = Outbound-User

Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Rewrote user name to vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Check if Handler User-Name = /^offload1/
should be used to handle this request
Wed Feb 14 22:44:47 2001: DEBUG: Check if Handler User-Name = /^gamma1/
should be used to handle this request
Wed Feb 14 22:44:47 2001: DEBUG: Check if Handler User-Name = /^v.*-out$/i
should be used to handle this request
Wed Feb 14 22:44:47 2001: DEBUG: Handling request with Handler 'User-Name =
/^v.*-out$/i'
Wed Feb 14 22:44:47 2001: DEBUG: Handling with Radius::AuthLDAP2
Wed Feb 14 22:44:47 2001: DEBUG: LDAP got result for l=620-3035, l=620-0000,
ou=Sales, o=Company, c=DE
Wed Feb 14 22:44:47 2001: DEBUG: LDAP got cdsDEradiatorPassword2: zzzzzzz
Wed Feb 14 22:44:47 2001: DEBUG: LDAP got cdsDEradiatorConfiguration2:
cisco-avpair = "outbound:addr*zz.zz.zz.zz" cisco-avpair =
"outbound:dial-number=0011223344" cisco-avpair = "outbound:send-auth=2"
cisco-avpair = "outbound:send-secret=zzzzzzz" Service-Type = Outbound-User
Wed Feb 14 22:44:47 2001: DEBUG: Radius::AuthLDAP2 looks for match with
vd0153216-out
Wed Feb 14 22:44:47 2001: DEBUG: Radius::AuthLDAP2 REJECT: Bad Password
Wed Feb 14 22:44:47 2001: DEBUG: No entries for DEFAULT found in LDAP
database
Wed Feb 14 22:44:47 2001: DEBUG: Handling with Radius::AuthLDAP2
Wed Feb 14 22:44:47 2001: DEBUG: Connecting to yy.yy.yy.yyy, port 389
Wed Feb 14 22:46:47 2001: ERR: Could not open LDAP connection to
yy.yy.yy.yyy, port 389
Wed Feb 14 22:46:47 2001: DEBUG: Radius::AuthLDAP2 looks for match with
vd0153216-out
Wed Feb 14 22:46:47 2001: DEBUG: Connecting to yy.yy.yy.yyy, port 389
Wed Feb 14 22:48:47 2001: ERR: Could not open LDAP connection to
yy.yy.yy.yyy, port 389
Wed Feb 14 22:48:47 2001: INFO: Access rejected for vd0153216-out: No such
user
Wed Feb 14 22:48:47 2001: DEBUG: Packet dump:
*** Sending to zz.zz.zz.z port 1645 ....
Code:       Access-Reject
Identifier: 190
Authentic:  zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Attributes:
        Reply-Message = "Request Denied"

------ SNIPP END --------------


Tested with:
===========
Radiator-2.17.1 as well as Radiator-2.14.1. Problem occurs no matter whether
<AuthBy LDAP> or <AuthBy LDAP2> is used.

Special note:
=============
The problem can be just observed if the whole server B is not reachable. It
does not occur if the LDAP directory service on server B (the cluster) is
stopped but the host/cluster itself is still alive.


Questions:
==========
1. Can Radiator be changed/patched in a way that the 2 minute timeout value
of Net::LDAP can be overwritten by a configurable parameter specified in
radius.cfg or at the command line?
2. Why is Radiator trying to connect for the same failed client request a
second time by trying to connect to server B again, even since the first
attempt to connect to the LDAP server B has failed already? I would expect
that Radiator sends a REJECT or such as soon as it can not connect to the
LDAP server and therefor is not able to satisfy the request?


Thanks and regards,
Holger


===
Archive at http://www.starport.net/~radiator/
Announcements on [EMAIL PROTECTED]
To unsubscribe, email '[EMAIL PROTECTED]' with
'unsubscribe radiator' in the body of the message.
(RADIATOR) AuthBy LDAP and AuthBy LDAP2: Radaitor hang if fallback LDAP server is down

Reply via email to