Thanks for the info. The client is not using a NAT, so
the problem is probably the old Windows client. We have 500+
clients and we sometimes miss some, especially users that
configure their own systems and use AFS from their ISP
on their home systems, wirless, labs etc. That is why it
will be nice to have this fixed on the server side.

Jeffrey Altman wrote:
The combination of problems that you have experienced should
be solved in 1.4.1.  One of the issues that you were seeing is
that the client was contacting the server on a port other than
7001 and the server was attempting to break callbacks on port
7001.  Since the NAT doesn't have a port mapping from 7001 to
the client, the callbacks could not be broken.  Every time the
client would contact the server, the server would believe that
it had callbacks for the client that must be broken and would
block the incoming RPC until the callbacks could be broken.

The 1.3.77 client also has a serious bug that would cause it
to generate rapid fire requests using a new RX Connection for
each RPC.  If you have 1.3.77 still deployed, try your best to
upgrade them.

The 1.4.1 file server (to be announced real soon now) goes to
great lengths to track clients by both address and port number
and to deal with clients behind NATs so that each time the NAT
allocates a new port number to the client the relevant host
entry will be updated to track it.  This should provide a very
good NAT experience for end users that have AFS clients that
support UUIDs.  All of the OpenAFS clients for UNIX/Linux support
UUIDs and Windows clients 1.3.80 and later do.

Jeffrey Altman


John W. Sopko Jr. wrote:
We have 3 OpenAFS 1.4.0 files ervers running on Redhat linux
enterprixe 3 with the latest patches. This morning when I
came in the servers were very slow and not responding to
client requests, they were basically hung. This in turn
pretty much takes down all our web servers file services
for home dirs etc.

I tracked this down to a "bad" afs windows client, the client
was running an old 1.3.77 version of the client or may have
a mis configured firewall. I halted the "bad" client
and this fixed our server problems. I turned up
debugging on the file server (kill -TSTP) and got the below
messages I used to track this down. I searched the afs-info
archives and this problem was discussed in 2002 and was
supposed to get fixed. Is this
fixed in a version newer then 1.4.0? That is, not allowing
clients to bring down the server with bad callbacks. Thanks
for your input.

Tue Apr 18 10:20:19 2006 CB: RCallBackConnectBack failed for
152.2.128.182:7001
Tue Apr 18 10:22:27 2006 [12] CB: Call back connect back failed (in
break delayed) for 152.2.128.182:7001
Tue Apr 18 10:22:27 2006 [12] BreakDelayedCallbacks FAILED for host
152.2.128.182 which IS UP.  Possible network or routing failure.
Tue Apr 18 10:22:27 2006 [12] MultiProbe failed to find new address for
host 152.2.128.182:7001
Tue Apr 18 10:24:34 2006 [7] CB: WhoAreYou failed for
152.2.128.182:7001, error -03
Tue Apr 18 10:26:42 2006 [7] CB: Call back connect back failed (in break
delayed) for 152.2.128.182:7001
Tue Apr 18 10:26:42 2006 [7] BreakDelayedCallbacks FAILED for host
152.2.128.182 which IS UP.  Possible network or routing failure.

Here is the old post about this:

--------------------------------------------
From [EMAIL PROTECTED]  Tue Aug 27 12:13:13 2002
Date: Tue, 27 Aug 2002 18:12:59 +0200
From: FBO <[EMAIL PROTECTED]>
To: [email protected]

              432936,1      22%
X-BeenThere: [email protected]
X-Mailman-Version: 2.0.4
Precedence: bulk
List-Help: <mailto:[EMAIL PROTECTED]>
List-Post: <mailto:[email protected]>
List-Subscribe: <https://lists.openafs.org/mailman/listinfo/openafs-info>,
        <mailto:[EMAIL PROTECTED]>
List-Id: OpenAFS Info/Discussion <openafs-info.openafs.org>
List-Unsubscribe:
<https://lists.openafs.org/mailman/listinfo/openafs-info>,
        <mailto:[EMAIL PROTECTED]>
List-Archive: <https://lists.openafs.org/pipermail/openafs-info/>

Hello,

We (Solaris 8, Transarc 3.6 2.32 servers, 3.6 2.26 db servers) had an
issue where a client with a certain firewall (Zone Alarm and or Black
Ice) configuration (allowing AFS traffic out but no AFS traffic in, or
more precisely, it didn't allow any _uninitiated_ inbound AFS traffic
e.g. a fileserver callback) caused the fileserver (a couple actually) to
come to a crawl (reads/writes taking 10minutes or more to complete) and
become virtually unusable.  Had to end up blocking this firewall'ed
client machine to get fileservers back to normal.  During "outage"
FileLog would repeat following message sequence every minute:

Wed Jul 10 16:22:55 2002 BreakDelayedCallbacks FAILED for host 894f2528
which IS UP.  Possible network or routing failure.
Wed Jul 10 16:22:55 2002 MultiProbe failed to find new address for
host894f2528.7001
Wed Jul 10 16:23:51 2002 CB: Call back connect back failed (in break
delayed) for 894f2528.7001

We have not been able to duplicate the problem but we've experienced it
2 to 3 times within about 3 months.

Below is the explanation I got from Transarc. They've informed us that a
fix is en route.  Has anybody ever experienced this in openafs (or
anywhere)?






--
John W. Sopko Jr.               University of North Carolina
email: sopko AT cs.unc.edu      Computer Science Dept., CB 3175
Phone: 919-962-1844             Sitterson Hall; Room 044
Fax:   919-962-1799             Chapel Hill, NC 27599-3175
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to